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INTRODUCTION 

, Ranking  and  selection  procedures,  subnet  selection 

procedures  in  particular,  provide  in  a realistic  runner 

attractive  ways  ot  handling  problems  that  me  cor  ;:;only 

treated  by  the  2-eetion  procedure  of  a global  r- test,  and 

the  many  mnny-uetion  procedure  of  a typical  multiple  j amje 

test.  Consider  the  common  one-way  layout  situation  in 

analysis  of  variance.  Usually  the  experimenter  wants  to 

f know  more  than  just  whether  all  the  treatment  effects  are 

equal,  but  he  may  not  want  to  make  inferences  concerning 
j 

all  pair-wise  differences  of  means,  or  all  1 i nc  ar  contrasts 
of  means . One  of  the  more  frequently  occurring  situations 
for  which  this  is  so  is  where  the  experimenter  simply 
wishes  to  know  which  of  the  treatments  gives  the  best 
* product.  In  this  situation,  formulating  the  problem  as  a 

« selection  problem  is  appropriate.  Subset  selection  pro- 

i cedures  are  often  thought  of  as  screening  procedures.  If 

j the  data  indicate  several  treatments  ar e better  than  the 

remaining  treatments  but  no  treatment  is  clearly  the  best, 

ti 

3 then  perhaps  the  experimenter  ought  to  retain  all  of  the 

•i 

better  treatments  for  future  consideration.  Of  cour  ?;o  the 
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concept  of  sublet  selection  has  a much  wider  scope  of 
application  than  just  one-way  layouts.  We  shall  give  two 
examples  to  show  how  situations  for  which  the  subset 
s<  I'Ctioj;  formulation  is  appropriate  arise  in  practice. 

The  first  example  is  adapted  from  an  experiment  con- 
ducted here  at  Purdue.  Suppose  we  wish  to  determine  how 
people  perceive  various  colors.  For  instance,  would  most 
people  perceive  rod  as  being  hot?  gray  as  being  cold? 

Results  of  this  type  of  experiments  have  been  applied  in 
practice.  For  example,  certain  fast  food  chain  paints 
all  its  restaurants  in  certain  colors  because  studies  have 
shown  people  tend  to  leave  the  premise  more  quickly  if  the 
premise  is  painted  in  those  colors.  So  suppose  n experimental 
subjects  are  chosen  and  there  are  k available  colors. 
Corresponding  to  each  adjective  of  interest  each  subject 
chooses  one  of  the  k colors  that  he  or  she  preceives  to 
fit  the  adjective  most  closely.  Then  for  this  experiment 
the  underlying  distribution  is  the  multinomial  distribution 
with  the  number  of  observations  equal  to  n and  the  number 
of  cells  equal  to  k.  For  the  type  of  application  mentioned 
one  wants  to  select  the  color  or  colors  corresponding  to 
the  cell  with  the  highest  frequency  of  occurance.  Hence 
the  subset  selection  formulation  is  appropriate. 

The  second  example  arose  in  the  field  of  Bionucleonics. 
This  was  an  actual  experiment  that  the  author  came  across 
in  consulting.  In  manufacturing  radioactive  trace  element:-. 
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carrier  particles  were  to  be  made  from  a petroleum  base 
by  the  use  of  heavy  pressure.  When  injected  into  the 
patient's  body,  particles  that  are  too  large  get  absorbed 
by  the  wrong  organs  and  those  too  small  go  out  of  the 
system.  There  were  four  possible  pressure  settings.  To 
each  pressure  setting  there  corresponds  a particle  size 
distribution.  The  object  here  was  to  select  the  pressure 
setting  such  that  for  the  same  total  amount  of  radiation  the 
amount  of  radiation  attached  to  particles  in  the  desirable 
size  range  is  the  largest.  Hence  the  subset  selection 
formulation  is  appropriate.  Notice  here  the  parameter  of 
interest  can  be  an  extremely  complex  function  of  the 
theoretical  particle  size  distribution. 

Heuristically  proposed  'subset  selection'  procedures 
of  Gupta  (1956,  1965)  have  been  in  existence  for  some  time. 
For  related  work  and  thinking  along  subset  selection  lines, 
reference  should  be  made  to  Paulson  (1945)  and  Seal  (1955) . 
However,  unlike  the  F-test  and  the  multiple  range  tests,  the 
use  of  these  procedures  in  practice  have  been  virtually  nil. 
This,  the  author  believes,  can  be  attributed  to  two  main 
reasons : 

1)  No  computer  packages  exist  to  facilitate  the  use 
of  these  subset  selection  procedures.  None  of  the  commonly 
used  statistical  packages  (e.g.,  SPSS,  BMD)  includes  subset 
selection  procedures  as  part  of  the  package. 
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2)  Research  concerning  the  performance  of  these 
hcuristically  proposed  procedures  is  i nadequate . Potential 
users  can  not,  in  general,  be  guaranteed  any  optimality 
properties  of  these  procedures. 

It  is  generally  recognized  that  for  multivariate 
problems  uniformly  best  procedures  usually  do  not  exist. 

In  fact  in  most  of  the  situations  of  practical  interest, 
there  do  not  even  exist  uniformly  best  unbiased  procedures. 
Hence  it  is  reasonable  to  look  for  procedures  that  do  well 
on  the  average,  averaged  over  the  parameter  space  by  some 
prior.  This  approach  has  been  taken  in  the  first  part  of 
the  thesis.  The  essentially  complete  class  of  Bayes 
procedures  and  their  limits  is  investigated.  The  concept  of 
Total  Monotone  Likelihood  Ratio  is  introduced  as  the  multi- 
variate analog  of  univariate  monotone  likelihood  ratio. 

Then  a multivariate  analog  of  the  classical  univariate 
result  of  Karlin  and  Rubin  (1956)  that  monotone  procedures 
form  an  essentially  complete  class,  is  proved  for  a loss 
function  which  seems  natural  to  the  subset  selection  prob- 
lem by  proving  that  Bayes  procedures  are  monotone. ^ 

Bayes  procedures  typically  require  numerical  integra- 
tions to  implement  and  this  makes  them  unsuitable  for  practical 
use.  Besides,  the  use  of  Bayes  procedures  is  by  no  means 
universally  accepted.  So  if  there  is  available  an  easy  to 
implement  procedure  whose  performance  is  close  to  that  of  the 
Bayes  procedure,  then  this  procedure  ought  to  be  used. 


•a. 
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This  possibility  is  explored  in  chop! or  3 for  the  ease  of 
normal  populations  problem  and  . ■ • • tu ] exchangeable  priors. 
As  it  turns  out,  Gupta's  procedure  is  good  compared  to 
the  Bayes  procedure  throughout  the  range  of  the  normal 
prior  while  Seal's  procedure  is  good  only  when  the  normal 
prior  is  concentrated,  that  is,  when  the  normal  prior  is 
very  informative.  As  of  yet  we  do  not  know  how  these  pro- 
cedures perform  when  the  priors  are  not  normal,  in  particu 
lar  when  the  priors  have  longer  tails  than  the  normal 
distribution.  But  from  what  we  know  Gupta's  procedure 
seems  to  be  the  logical  choice  when  the  observations  arise 
from  normal  distributions. 

There  are  heuristically  proposed  procedures  for  many 
other  distributions  in  the  exponential  family  of  distribu- 
tions. Little  is  known  concerning  the  performance  of 
these  procedures.  They  really  have  to  be  investigated 
case  by  case.  But  in  the  case  where  the  parameter  of 
interest  is  a location  parameter  and  the  underlying  dist.ri 
bution  is  not  entirely  known  there  are  known  good  robust 
estimators  of  the  parameter.  Under  mild  regularity  condi- 
tions they  are  asumptotically  normal.  From  the  results  of 
Chapter  3,  one  would  expect  Gupta's  procedure  based  on 
these  robust  estimators  to  be  asymptotically  good.  In 
Chapter  4 of  the  thesis  robust  and  nonparametric  versions 
of  Gupta's  procedure  are  proposed  and  their  performance, 
studied.  One  procedure  in  particular,  the  procedure  based 
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from  rank  tests, 


on  simultaneous  confidence  bounds  derive 
is  a truly  nonparametric  subset  selectio 
controls  the  infimum  of  the  probability 
selection  for  any  sample  size.  Because 
essentially  on  the  Hodges-Lehmann  estima 
good  asymptotic  performance. 
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CHAPTER  1 


SOKE  DECISION -THEORETIC  PRELIMINARIES  AND  KNOWN 


RESULTS 


In  (hi:,  chapter  we  give  some  decision-theoretic  pre- 
liminaries and  list  some  known  results  particularly  those 
applicable  to  finite  action  problems.  Although  we  shall  make 
use  of  only  one  of  the  results  in  this  thesis,  namely  Bayes 
procedures  and  their  limits  form  an  essentially  complete 
class,  it  seems  desirable  to  have  the  important  results 
listed  in  an  orderly  fashion  for  the  benefit  of  future  work- 
ers in  the  field.  We  want  to  emphasize  that  these  results 
pertain  to  all  finite  action  problems.  Hence  they  are  appli- 
cable to  the  classification  problem,  the  identification  prob- 
lem, the  complete  ranking  problem,  the  treatments  versus 
control  problem,  the  selection  problem  using  the  indifference 
zone  approach,  and  the  selection  problem  using  the  subset 
selection  approach.  We  follow  throughout  the  development  in 
Brown  (1974) . 

We  begin  by  describing  in  a mathematically  precise 
fashion  the  formulation  of  the  statistical  decision  problem. 
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Notation . We  denote  by  P(f>)  the  set  of  all  probability 
measures  on  the  o-field  8. 

Definition  1.3.  A parametrized  set  of  possible  distri- 
bution is  a measurable  map  from  <1  to  P(B^).  We  denote 
the  value  of  this  map  at  a pair  4>l4>,  ScBr  by  F(s|cf<).  Note 
that  to  say  F ( * [ • ) is  8^  measurable  means: 

(1)  For  each  <J)E<l>f  F(*|<Ji)  is  a probability  distribution 
on  8~ . 

(2)  For  each  ScB<,,  F(s|*)  is  a measurable  map  of  (^,8^) 

into  ( R , 8 ( R) ) . (8 (R)  denotes  the  Baire  o-field  on  R,  the  reals.) 

The  set  of  distributions  {F  ( • | <(>)  : </>  e <3> } is  callc-d  the  set 
of  possible  distributions . 

Remark  1.1.  it  is  possible  to  parametrize  any  set  of 
distributions.  Suppose  FCP(8)  is  a set  of  probability  dis- 
tributions. One  can  set  $ = F and  define  8,  to  be  the  o- 

$ 

field  consisting  of  all  subsets  of  4> . This  definition  of 
8^  guarantees  that  F ( * | • ) is  measurable. 

Notation-  If  the  family  of  distributions  (F  ( • j <>)  : <*>(£}  is 
dominated  by  some  a-finite  measure  y,  then  f ( * ] 4’ ) denotes  a 
version  of  the  density  dF/dy,  that  is, 

F (S  1 4>)  = Jf  (s|<|>)dn  (s)  . 

s 

Note  that  if  (S,8<.,y)  is  separable  then  f may  be 
chosen  to  be  a measurable  function  from  (Sx$,8^  xR^)  to 
(R, 8 (R)  ) . 

Definition  1.4.  The  action  space  is  a measurable  space 

A* 


A with  o-field  B 
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De l'.i ni  t i on  1.5.  A decision  procedure  & is  a B me  isur- 
ablc  map  from  S to  PfB^).  Wo  shall  denote  the  value  or  ' at 
seB,EcB^  by  <S(s,B).  Note  that  6 measurable  means 

(1)  For  each  seS , <5 (s , • ) eP ( B^) . 

(2)  For  each  BtB^,  <S(*,B)  is  a measurable  function  from 
int  o (R,  B ( R)  ) . 

Notation-  V denotes  the  set  of  all  decision  procedures. 

Definition  l.C-  The  set  of  available  decision  proced- 
ures, denotes  by  V , is  a subset  of  V . 

Remark  1.2.  Some  examples  of  V are  the  class  of  in- 
variant procedures,  the  class  of  monotone  procedures  etc. 

Definition 1 . 7.  The  loss  function  L is  a measurable 

function  from  ( 4-xA  , B^xB^)  to  ([  0 ,«•],  B ( | 0 , '»■•]))  . 

Definition  1.8.  The  risk  function  of  a procedure  5 is 
the  function  R(*,fi):4»-*f  0,°°)  defined  by 
R(4'/<5)  = / / L ( (f . ,a)  <5  (x,da)F(dx  j <J>). 

Definition  1.9.  Let  T = { t :$-*•[  0 , «•  ] } have  the  weak 
(Tychonoff)  topology  defined  by  t yt  if  and  only  if 
t ^ ) ->t  (<p)  for  all  For  VQT  let  V = { t. : teT,  3 t 1 r V • ■) • t ’ 

_<  t}  where  t’  £ t means  t * ( c{/ ) £ t(<J))  for  all  $ e $> . 

Note : T is  compact  Hausdorff. 

Notation.  For  Vq  Q V let  l(Po)  denote  the  set  of  all 
risk  functions  corresponding  to  P . 

Definition  1.10.  For  any  non-negative  measure  P on 
(<!>,  8^)  and  6cP,  define  the  integrated  risk  B(P,5)  by 
B(P,6)  = / R($,6)P(d*). 
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rn  f i ait  ' oil  1.11.  For  any  non-negative  me  assure  P on 

('•  « Ka i d to  be  a Laplace  procedure  for  P re  la- 

tive to  P^  if  and  only  if 

H (P , . * ) = inf  B (P, 6 ) . 

Si  V 

o 

Tf  P in  a pro)' ability  measure,  that  is,  PrP(8  ),  then  6* 

V 

is  called  a Bayes  procedure. 

PSlidlit-ipi LJLdl • Given  F(*|>)  and  P(-)  as  above  define 
11  * to  be  tht>  measure  generated  by  F and  P on  the  product 
space  (Sxv, G^xB^) . Thus  )I  is  the  measure  generated  by  the 
relation 

n’(SxA)  = /F  ( S | <f> ) clP  ( 4> ) 

A 

for  StB^,  A cBq.  Let  J1  denote  the  projection  of  J!  ’ on  (S,8^) 
i.o.  n (S)  = IT  (Sx<i>)  . 

Nota tion . If  it  exist  we  denote  the  8^  measurable 
conditional  measure  on  8^  given  5 relative  to  li1  by  !>(•(•). 
That  is,  P(*  | •)  is  8^  measurable  and  p(* | •)  satisfies 

/P(A  | s)  11  (ds)  - Jl'(SxA)  for  all  Se80,  AcB.. 

S S'  <I> 

If  P(*)  is  a probability  distribution  then  P ( • J * ) is  called 
the  posterior  distribution  on  $ given  (S,B  ). 

_ finition  1.13.  When  P(*j*)  exists  define  for  acA, 

seS 

B (a  | s)  = /L  (<{> , a)  P (d<J)  | s ) . 

B(a|s)  may  bo  described  as  the  posterior  risk  incurred  from 
taking  action  a. 


J-«--  ■?  • 
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Theorem  1.1.  (Brovin,  1974).  Assume  P ( * { * ) exists. 


(1.1)  A*(s)  = {a  : ac  A , B (a|  s)  = inf  B(a|s)}. 

ar  A 

Suppose  A*(s)  is  non-empty  a.e.  n and  there  exists  a pro- 
cedure 6 *eP  such  that  6*(s,A*(s))  = 1 a.e.  IT. 

Then  6*  is  a Laplace  procedure  for  P.  If  B(P)<°j  then  any 
other  Laplace  procedure  S must  also  satisfy  6(s,A*(s))  - 1 


a.e.  II . 


Remark  1.3.  If  A is  finite  then  A*(s)  is  non-empty 

a.e.  n . 

Corollary  1.1.  (Brown,  1974).  Suppose  the  set  A*(s) 
as  defined  in  (1.1)  consists  of  a single  point  of  A a.e.  II. 
Suppose  that  B ^ contains  all  single  points  and  that  there  is 
a measurable  function  d:S  ->■  A such  that  d(s)  = A*(s)  a.e.  n. 
Then  the  non-randomi zed  procedure  6*  defined  by 

6Ms',)  = ed(s)  (>) 

where  denotes  the  probability  measure  which  gives 

probability  1 to  the  point  d(s)  is  a Laplace  procedure  for  P. 

Suppose  in  addition  F ( • | <f> ) is  absolutely  continuous 
with  respect  to  II  for  every  <pc< J>  and  there  is  a 6rP  such  that 
B(P,6)  < °°.  Then  6*  defined  above  is  the  unique  Laplace 
procedure . 


The  following  theorem  is  well  known. 

Theorem  1.2.  (Brown,  1974).  If  6*cV  is  the  unique 
Laplace  procedure  for  some  P,  then  f>*  is  admissable. 
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A wide  variety  of  statistical  results  are  based  on  the 


compactness  of  r(PQ).  We  shall  list  throe  important  ones. 

Theorem  1.3.  (Brown,  1974).  Suppose  r(P^)  is  compact 
in  T.  Then 

(1)  There  exists  a minimal  complete  class  relative  to 


(2)  There  exists  an  admissable  minimax  procedure  rela- 
tive to  V . 

o 

In  order  to  state  the  next  result,  which  is  the  only  re- 
sult that  will  be  used  in  this  thesis,  it  is  necessary  to  de- 
scribe what  is  meant  by  the  limit  of  a net  of  decision  pro- 
cedures. This  is  most  easily  done  when  the  family  of 

distributions  {F  ( • | 4>)  : <K<&}  is  dominated  by  some  o-finite 
measure  p,  and  A has  an  appropriate  topology  on  it  for  which 
A is  compact  and  8^  = 8(A) , the  Baire  a-field  on  A . So 
under  these  assumptions  we  define  convergence. 

Definition  1.14.  Under  the  assumptions  stated  above,  a 
net  {6^}  is  said  to  converge  to  6 in  the  weak  topology  on  V 
if  and  only  if  for  every  f eL^  (S , , p)  and  £eC(A)  (C(A)  is 
the  class  of  real- valued  continuous  function  on  A) 

//f  (s)  2.  (a)  6^  (s,da)  )j  (ds)  -+  //f(s)2(a)6(s,da)u(ds). 

Definition  1.15.  Any  non-negative  measure  on  ($,B.)  is 

<p 

called  a prior . A measure  P on  (4>,B^)  is  called  simple  if  P 
is  a discrete  measure  concentrated  on  a finite  set  of  0. 


We  now  state  the  third  result. 
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Theorem  1.4.  (Brown,  1974),  Suppose  1-1:  family  of 

distributions  (F ( • | <J>)  : <(>e4>}  is  dominated  by  tiro  -fni'te 

measure  p,  A is  compact  second  countable  and  L ( ;■  , * ) is  lower 

semi-continuous  for  each  <pe$.  Let  V be  (weakly)  compact  and 

o 

r ( ) closed  and  convex  in  T.  Then  the  (v:oak)  closure  of  the 
set  of  Bayes  procedures  for  simple  priors  relative  to  the  set 
VQ  is  an  essentially  complete  class  in  P^. 

The.  following  theorem  gives  a sufficient  condition  for 
r(PQ)  to  be  compact. 

Theorem  1,5.  (Brown,  1974).  Suppose  the  family  of  dis- 
tributions (F  ( • | <j>)  : <f>e<5> } is  dominated  by  some  o-f.inito  measure 
p,  and  A has  an  appropriate  topology  on  it  for  which  A is 
compact  second  countable  Hausdorff,  B^  = B(A),  and  L ( $ , • ) is 

lower  semi-continuous  for  every  <fc4>.  Then  V closed  in  the 

o 

(weak)  topology  on  P implies  that  F(Po)  is  compact  and  her.ee 
closed  in  T. 

Remark  1.4.  If  the  hypothesis  of  Theorem  1.5  is  satis- 
fied, then  for  Theorem  1.3  to  apply  one  has  to  check  that  VQ 
is  weakly  closed.  For  Theorem  1.4  to  apply,  one  checks  in 
addition  that  r(Do)  is  convex. 

Remark  1.5.  Suppose  the  family  of  distributions 
{F(  • | f>)  : <f>  c <1>  > is  dominated  by  some  a-finite  measure  p and  A 
is  finite.  Then  by  giving  A the  discrete  topology  the  hypo- 
thesis of  Theorem  1.5  is  satisfied  for  any  loss  function  I.. 
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CHAPTER  2 


DECISION -THEORETIC  RESULTS  FOR  THE 


SUBSET  SELECTION  PROBLEM 


This  chapter  deals  with  some  decision- theoretic  results 
for  subset  selection  problems.  In  Sections  1 and  2 we  show 
that  essentially  nothing  is  lost  if  we  restrict  our  atten- 
tion to  Bayes  procedures  only.  In  particular  in  Section  1 
it  is  shown  that  relative  to  the  class  of  all  subset  selec- 
tion procedures,  Bayes  procedures  together  with  their  limits 
form  an  essentially  complete  class.  In  Section  2 it  is 
shown  that  relative  to  the  class  of  permutationa.l  ly  invariant 
procedures,  Bayes  procedures  for  exchangeable  priors,  to- 
gether with  their  limits,  form  an  essentially  complete  class. 

In  Section  3 the  concepts  of  Total  Monotone  Likelihood 
Ratio  (TMLP.)  and  Total  Stochastic  Monotone  Property  (TSMP) 
are  introduced  as  multivariate  generalizations  of  the  con- 
cepts of  (univariate)  monotone  likelihood  ratio  and  (uni- 
variate) stochastic  ordering.  The  related  concept  of  Proper- 
ty M,  first  introduced  in  Eaton  (1967),  is  also  described. 
Implications  of  each  of  the  concepts  and  relationships  be- 
tween the  three  are  studied  in  detail.  Examples  of  families 
of  distributions  having  the  various  properties  arc  also  given. 
These  concepts  are  used  in  all  of  the  succeeding  sections 


] r> 


t 

( 


to  obtain  results  concerning  the  form  of  Bayes  procs *du ? r. : . 

In  Section  4 the  form  of  Bayes  procedures  when  the 
densities  have  property  M and  the  loss  function  is  mouoi  i.. 
is  investigated. 

In  Section  5 we  describe  a loss  function  which  seem' 
natural  for  the  subset  selection  problem.  For  this  loss 
function  sufficient  conditions  for  procedures  to  be  B aye s u.o 
given.  For  the  same  loss  function  a sufficient  condition  fo’- 
the  uniqueness  of  Bayes  procedures  is  given  in  Section  G. 

In  Section  7 the  main  theorem  of  this  chapt  r in  provd. 
It  is  shown  that  under  certain  conditions  the  class  of 
monotone  procedures  forms  an  essentially  complet  cla-s. 


2 . 1 . Peel s ion -Theoretic  Results  for  t_he 
Genera  1 S nbset  Sc  lection  V rob 1 e m 

The  sample  space  S is  a measurable  space  with  an  associ- 
ated o-fi.eld  8$. 

The  parameter  space  $ is  a measurable  space  with  an 
associated  o-field  8^. 

We  assume  the  set  of  possible  distributions  { F ( • | <? ) : 
cf> c 4> } is  dominated  by  some  o-finite  measure  u. 

The  action  space  A is  the  set  of  all  non-empty  subsets 
of  {l,2,...,p}  together  with  the  power  set  of  A as  its  associ 
ated  a-algebra  B^.  The  action  a Q {l,2,...,p}  is  to  be  in- 
terpreted as  the  action  of  selecting  the  populations 
{IK  , it  a } . 

A subset  selection  procedure  6 is  a measurable 


function  from  (S,8C)  to  P(BA).  The  class  of  nil  subset 


selection  procedures  is  donated  by  V. 

The  loss  function  L is  a measurable  function  from 

(vxA,  B4xBa)  to  ( [ 0 , °°]  , B ( [0 , °°] ) ) . 

By  Remark  1.5  and  Theorem  1.5  of  Chapter  1,  F(P)  is 
/r>»  — 

compact.  Now  V (V)  is  always  convex.  Hence  Theorem  1.3  and 
Theorem  1.4  apply.  However,  we  shall  only  make  use  of 
Theorem  1.4  which  is  restated  as 

Theorem  2.1.1.  Relative  to  V,  the  (weak)  closure  in  the 
topology  on  D of  the  procedures  that  are  Bayes  relative  to  V 
forms  an  essentially  complete  class. 

We  shall  assume,  throughout  the  thesis,  that  the  param- 
eter space  4<  is  a subset  of  the  Euclidean  space  R^+r,  and 
that  for  c pr,4> , the  first  p components  of  <J>  are  the  parameters 
of  interest,  and  the  last  r components  of  4-  are  nuisance 
parameters.  When  we  write  (6,ip)c4>,  0 will  always  be  the  p- 
dintcnsional  vector  of  parameters  of  interest  and  ip  will  al- 
ways be  the  r-dimensional  vector  of  nuisance  parameters.  The 
projection  of  41  onto  the  first  p coordinates  will  be  denoted 
by  0 and  the  projection  of  i>  onto  the  last  r coordinates  will 
be  denoted  by  ¥ . Note  that  OxT  does  not  necessarily  equal 

<t>.  We  shall  assume  that  8.  is  the  o-field  inherited  from 

4> 

the  Borel  o-field  on  RB>+r. 

We  shall  assume,  through  the  entire  thesis,  that  the 
sample  space  S is  a subset  of  the  Euclidean  space  R^q. 

When  we  write  (x,y ) cS , x shall  always  be  a p-dimensional 
vector  and  y shall  always  be  a q-dimensional  vector.  Roughly 


speaking,  x will  be  the  part  of  the  observation  that  gives 
information  concerning  the  relative  ordering  of  the  0 ' s 
while  y will  be  the  remaining  part  of  the  obsvr v . tion.  The 
projection  of  S onto  the  first  p coordinates  will  be  denoted 
by  X and  the  projection  of  S onto  the  last  q coordinates  will 
be  denoted  by  V . Again,  Xx^  need  not  equal  S.  Wo  shall 
assume  that  B <,  is  the  o-field  inherited  from  the  Borcl  o- 
field  on  Rp+<^’ 

To  fix  ideas,  consider  the  following  example.  Suppose 
n observations  are  taken  from  each  of  the  p independent 
normal  populations  with  unknown  means  and  unknown,  possibly 


unequal,  variances.  Say  the  observations  are  5W,  a = 1, 

...,n,  i — l,...,p,  { Z ® = If  • • • »n)  iid  N(tm,o^).  By 

sufficiency  we  can  reduce  the  observations  to  (Z^,...,Z  , 
2 2 

S.  ,...,S  ).  Suppose  we  want  to  select  in  terms  of  the 

t P 

2 2 — 

means.  Then  0 = ),  ^ = (a^  ,...,o^  ),  x = (Z^, 

— 2 2 
. . . , Zp)  and  y = (sL  , . . . ,sp  ) . 


2.2.  Decision -Theoretic  Results  for  the 
(Permutationally ) Invariant  Subset  Selection  Problem 
Notation . Let  S^  be  the  group  of  permutations  on 
{l,2,...,p}.  (The  symmetric  group  of  order  p) . The  element 
of  Sp  which  interchanges  i and  j,  leaving  all  other  members 
of  { 1 , 2 , . . . , p}  fixed,  is  denoted  by  (i,j).  For  (x,y)  cF1'1'4  ^ 


and  TreS  , define  ir(x,y)  by  n(x,y)  = (fix,y)  where  tx  is  de- 
P 

fined  by  (ttx)  ^ - x^-1^.  Similarly,  for  ( 0 , «!> ) r.Rpfr  and 

tt  e S , v(0,4>)  is  defined  by  tt(9,i|»)  --  where  Cn  0 ) ^ - 6-, -]  j 
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For  any  set  S C R^H<^,  nS  will  denote  the  image  of  S under  • . 


Similarly  for  any  set  AC_Rp4r,  ir A denotes  the  linage  of  A 
under  n. 

We  assume  that  the  sample  space  S is  a Borel  subset  of 


>P+r 


R1  invariant  under  S , that  is,  for  any  neS  , uS  - S.  We 

P P 


assume  that  B^  is  the  o-field  inherited  from  the  Borel 


o -field  on  R 


P+q 


We  assume  the  parameter  space  $ is  a Borel  subset  of 


sP+r 


R1" " invariant  under  S , that  is,  for  any  vtS  , Tnf  - <J..  We 

p J p ' 


assume  8^  is  the  o-field  inherited  from  the  Borel  o-field 


,P+r 


on  Rc 

It  is  assumed  that  the  family  of  possible  distributions 
{F  ( • , • | 0 , ifi)  : ( 0 , if/)  eG1}  is  dominated  by  some  o-finite  measure  |i . 
We  further  assume  the  following  invariance  properties  for  the 
densities  { f ( • , • | 0 , «/* ) : (0  , i|>)  e«J> } and  the  measure  jj  : 
f (iix,y|ir0,i{i)  = f (x,y  | 0 ,tj/)  , 
du (rx,y)  = dy (x,y) . 

The  sp£ice  of  possible  actions  A is  the  set  of  all  non- 
empty subsets  of  {l,2,...p}  together  with  the  power  set  of 
A as  the  associated  o-algebra  B^. 

A decision  procedure  6 is  a measurable  function  from 


(S,B^)  to  P(B^).  The  class  of  all  decision  procedures  is 


denoted  by  V.  For  6eP  and  tte.  S , define  v 6 by  n6(*,a) 

P 


6 ( • , if  ^a)  . A procedure  6 is  sai  d to  be  (permutationally) 


invariant  if  and  only  if  S(Trs,na)  = 6(s,a)\/r.c5,  acA. 


Denote  the  class  of  invariant  procedures  by  V ^ . 


The  loss  function  L is  a measurable  fund  j on  from  ( 5 xA , 
B^xC^)  to  ([0,;  ],  P(  [0,"]))  . We  assume  L satisfies  the 
following  invariance  assumption: 

I.  ( Tr<f> , ira)  = L ( <j> , n ) V <f> , a and  :r. 

Pef  i ni  tion  2.2.1.  A non-neqntive  measure  P on  { 'J- , B ) 
is  said  to  be  exchangeable  if  and  only  if  for  any  rtS^  , 

A cSfli,  P ( a A ) - P (A)  . 

Clearly  in  the  above  setup  the  decision  problem  is  in- 
variant under  the  group  S^.  The  following  Hunt-Stein  type 
theorem  gives  support  for  considering  only  procedures  in 
Pj  if  the  prime  consideration  is  the  suprumum  of  the  risk. 

Theorem  2.2.1.  Given  any  6f:P,  3 ’’  r-uP  P('!  > Aj)  / 

(ft  s’ 

sup  R ( <(> , 6 ) . 

({'  L (Jt 


Proof.  Define  6. 


by  <S  (s,a)  = (1/p!)  I A(rs,:a) 

X . ..  r» 


Then  6 is  invariant.  Dut  sup  R(<f,6  ) = sup  (J/p!)  ) 

<j>Ct>  1!  l Sp 

R ( n 4> , 6 ) < (1/p!)  I sup  R ( Tr  (f> , 6 ) = (1/p!)  I sup  R«,A)  = 

ucSp  <fc$  rtSp  (fcO 

sup  R ( <J> , 6 ) . 

4>c4> 

If  one  does  restrict  attention  to  (permutational 1 y ) in- 


variant procedures  only,  then  the  following  can  he  proved. 


Lemma  2.2.1.  p is  closed  in  P in  the  weak  togology 


on  p. 

Proof.  Suppose  a net  (6  } in  converges  to  <5.  For 
fixed  acA  and  ^cS  , let  A..  = {s : 6 (if s , iia)  -6  (s , a)  >l/i } . 
Suppose  jj(A^)>0.  Then  there  exists  B^CA^  such  that  (R  ^ (B^) 
Now  I_  and  I n e L,  (5 , B _ , r ) . Both  I,  , and  I.  , r C( A). 

ii  ^ 1 t cl  / t *' « j 

Hence  / 6 (s , »a)  d|i  ( s)  j <5  (s , fa)  du  ( s)  and  / ( (s,a)du(s) 

TIB.  a t>  B ■ B.  0 

1 1 X 
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/ 6 (s,  a)  d|j  (s)  . But  / 6 (s , ria)  dvi  (s)  ~ / 6 (s,a)dj.i(s)  for  all  a. 
B.  ii  B . 01  B.  a 

l ii 

Therefore  / [6  ( ns , ira) -6  ( s , a ) ] dp  ( s)  = 0.  But  the  left  side 
B . 

is  > (1/i)  utn.L)  . Hence  n(B^)  = 0.  Contradiction.  So  |;(A^) 

= 0.  u {s : 6 ( its  , Tra) -6  (s , a;  0}  = lim  p(A.)=0.  Similarly 

i.  °° 

U {s  : 6 ( irs  , na) -6  (s,  a)  <0 } = 0.  Therefore  n { s : 6 ( n s , i'a)  / 6(s,a)} 

= 0.  a and  n are  arbitrary.  Hence  isP^. 

Now  if  6^,  6^  then  a6^  + (l-aJ^t-P^  for  any  ae[0,l]. 

Hence  I'(P  ) is  convex.  Combining  this  and  the  previous  lemma 
we  see  that  Theorem  1.3  and  Theorem  1.4  apply.  However,  we 
shall  only  make  use  of  Theorem  1.4  in  this  thesis  which  we 
restate  as: 

Theorem  2.2.2.  Relative  to  the  class  of  (pormutational- 
ly)  invariant  procedures  P the  (weak)  closure  in  the 
topology  on  P of  the  Bayes  procedures  relative  to  P ^ forms 
an  essentially  complete  class. 

Given  a prior  it  is  often  easier  to  find  its  Bayes  pro- 
cedure^) relative  to  P than  to  find  its  Bayes  procedure(s) 
relative  to  P . The  following  theorem  gives  the  needed 
connection. 

Theorem  2.2.3.  The  class  of  procedures  that  are  Bayes 
relative  to  P is  contained  in  the  class  of  procedures  that 
are  Bayes  relative  to  P for  exchangeable  priors. 

Proof . Suppose  6ePT  is  Bayes  relative  to  P for  some 
prior  probability  measure  P on  ( <*' , 8 ^)  . Then  it  is  easy 
to  see  that  6 is  Bayes  relative  to  P for  the  prior  P^  de- 
fined by  Pq ( A ) (1/p!)  I P ( n A ) . For  Pq  relatively  to  P a 
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Bayes  procedure  exists  hence  an  invariant  Dave:  proof.- ■ pre- 
exist. Call  it  S'  . But  B ( 6 , P ) < B(S',P  ).  1J  :.c  • s 

o — o 

Bayes  relative  to  V for  P . 

o 

2.3.  Some  Orderings  on  Families  of  Di str ; 1 \P  j on s 
In  univariate  statistical  inference,  the  concoj  L of 
monotone  likelihood  ratio  plays  a central  role.  There for  a 
it  is  reasonable  to  think  that  the  concept  of  multivariate 
monotone  likelihood  ratio  should  be  important  i:.  mu 'i  t i variate 
statistical  inference.  Unfortunately  there  has  never  been  a 
unified  theory  of  multivariate  monotone  likelihood  ratio. 

In  studying  different  problems  different  definition.,  of 
multivariate  monotone  likelihood  ratio  were  proposed.  In 
Pratt  (1956),  a definition  of  monotone  likelihood  ratio  on 
contours  was  given.  Karlin  and  Truax  (1960),  and  later  Hall 
and  Kudo  (1968),  used  essentially  the  same  definition  in 
studying  slippage  tests.  In  studying  the  complete  ranking 
problem,  Bahadur  and  Goodman  (1952)  and  Lehmann  (1966)  made 
certain  independence  and  permutational  invariance  assumptions 
and  used  univariate  MLR.  It  was  later  found  by  Eaton  (i960) 
that  for  the  complete  ranking  problem  a weaker  condition 
which  he  called  Property  M suffices.  As  it  turns  out,  none 
of  the  above  concepts  is  really  adequate  for  the  problem  at 
hand,  namely  the  subset  selection  problem.  Wc  have  therefore 

■ ’ 

chosen  to  give  our  own  definition  of  multivariate  monotone 

ng  definition  of 
clear  that  these 


Jfl 


new  concepts  will  be  useful  in  studying  other 
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u 1 1 j pic  com- 
parison problems.  However,  it  is  not  yi  l cltnr  whether  the} 
will  be  useful  in  other,  more  general  , ...ul  tiv  -iote  infer- 
ence problems. 

Definition  2.3.1.  For  any  fixed  B C K - [l,2,...,p}  (B 

may  be  the  empty  set  or  the  whole  set),  define  a partial 

B 

ordering  ' less  than  or  equal  to  in  B'  (<)  on  A as  follows: 

B 

For  x =(xirx2,  . . . ,xp)  , x'  = (x'1,x,2,..  . ,x'  )/  >'*  1 >:  ’ if 
and  only  if 

< • 

x.  — x.  as  i c B. 

1 - 1 t 

B 

In  words,  x _<  x'  if  and  only  if  x is  less  than  or  equal 
to  x'  in  those  coordinates  that  are  in  B and  greater  than  or 
equal  to  x'  in  those  coordinates  that  are  not  in  B. 

The  above  definition  induces  a partial  ordering  on  S 
as  follows: 

B 

Definition  2.3.2.  For  (x,y),  (x',y')eS,  (x,y)  ^ (x',y') 

B 

if  and  only  if  x < x'  and  y = y'. 

Definition  2.3.3.  A set  ACS  is  said  to  be  nondecreas- 

B 

ing  in  B (/B)  if  and  only  if  seA,  s <_  s'  ->  s'  cA. 

Definition  2.3.4.  A function  h:5  -»  R is  said  to  be 

B 

nondecreasing  in  B(/B)  if  and  only  if  s £ s'  ~>h(s)  h(s'). 

Thus  a set  is  nondecreasing  in  B if  and  if  its  indicator 

function  as  a function  is  nondecreasing  in  B. 

B 

In  exactly  the  same  way  we  define  £ and  /B  on  , <;•  and 
for  functions  from  $ to  R. 


MMaaHiiMM 


/ t„ 


-----  - : 1- 


1 

Definition  2.3.5.  Tin  • dist  r • • • . • '(•!'■  ^ ) j s sr  i d t:o 

be  stochastically  smaller  in  B lira  I ( i ; and  only  if 

for  any  measurable  set  AC.'1  that  1 k none!  ere-  sing  in  B, 

F(A|4)1)  < F(A|*2). 

Definition  2.3.6.  The.  family  of  distributions  { F ( • | <» ) : 
i ) is  said  to  be  stochastically  nomlecreasi ng  in  B (FM/B ) 
if  and  only  if  for  any  measurable  set  A C S that  is  non- 
decreasing in  B,  F ( A j ) as  a function  of  •}  is  nondeeveasing 
in  B. 

Rema rk  2.3.1.  Lehmann  (1955)  in  studying  ordered 
families  of  distributions  defined  and  investigated  families 
of  distributions  that  are  stochastically  nondecroasi ng  in 

K = (1,2, .. . ,p) . 

Definition  2.3.7.  The  family  of  distributions  { F ( * ! >’ ) : 

4)S^}  is  said  to  have  Total  Stochastic  Monotone  Property  (TSMP) 

$ 

if  and  only  if  i.t  is  stochastically  nondec teasing  in  B for 

every  B C K = (1,2,... ,p}. 

Definition  2.3.8.  The  family  of  densities  { f ( * j <I< ) : 
— 

4> e 0 } is  said  to  have  nondecroasi  ng  likelihood  ratio  in  B 

(MLR/B)  if  and  only  if 
B B 

s < s’  , <p  < 4. ' =»f(s’|*)f  (s|<n  < f(s!<)f(s'  | <*>'). 

I' 

Remark  2.3.2.  In  investigating  multivariate  one-sided 
tests,  Oosterhoff  (1969)  defined  and  used  nondccroasing 

[ * 

likelihood  ratio  in  K = (l,2,...,p). 

I j 

Definition  2.3.9.  The  family  of  densities  ( f ( • | C ) : 

I j,-.  c •?> } is  said  to  have  Total  Monotone  Likelihood  Ratio  (T 

I v 

I *1 
I 

' (k 


if  and  only  if  it  has  nondecrcuri ng  livelihood  ratio  in  B 

for  every  B C K = {1,2,...,;-)}. 

Definition 2 . 3 . ] 0 . (Raton,  1967).  Under  the  symmetric 

setup  of  Section  2.2,  the  family  of  densities  { f ( • | s{>)  : <;C'  } 

is  said  to  have  Property  M if  and  only  if  for  (x,y)cS,  (c,'l) 

r<J>,  x — (x1,x2,  . . . ,xp)  , 6 ( 0 1 ' 0 2 ' • ‘ ' ,0p)  ' 

X±  1 Xj,  G±  <_  (K  =>  f (x,y  I :v!')  l.  f (x,y  j (i  , j ) 0 , i£) 

In  exactly  the  same  way  we  define  TSMP,  TMLR  and,  in 

the  case  of  exchangeable  prior.  Property  M for  the  posterior 

distributions  and  densities  {F(*|s):scS)  and  { f { • | s)  : scS } . 

Definition  2.3.11.  For  any  subset  selection  procedure 

6 let  6.(s)  = y 6(s,a),  i.e.  6.(s)  is  the  probability  of 
ira  1 

selecting  i having  observed  s.  A subset  selection  procedure 

6 is  said  to  be  monotone  if  and  only  if  for  each  i iR  (s)  is 

essentially  /{i},  that  is,  there  do  not  exist  S,  , 5U  C S, 

{i}  1 1 " 

p ( f> ^ ) > 0,  s^  £ s2  for  all  s2fS2  suc^1  that 

ess  sup  6.  (s)  > ess  inf  <5.(s). 
seS^  1 se52  1 

Remark  2.3.3.  What  wo  call  'monotone1  procedures 
traditionally  have  been  called  'just'  procedures  in  the 
literature.  See  Nagel  (1970)  and  Gupta  and  Nagel  (1971). 
Following  Gupta  and  Huang  (1976),  we  have  changed  the  termi- 
nology to  'monotone'  since  we  have  in  mind  the  analog  of 
the  classical  univariate  result  of  Karlin  and  Rubin  (1996) 
on  the  class  of  monotone  procedures. 


Theorem  2.3.1.  TMLR  ~>  TSMP . 


2b 


B 

Let  I;  . K = 1 .1 , 2 , . . . , p)  be  fixed.  For  1 let  S and  ,S  + 

l>e  Lb  s.ts  in  for  which  f ( s | <f>  * ) < f(s|4d  and  f(s|c')  > 

i ( n | <:  , •'''  cfci  vely  . 

Suppose  h,  a measurable  function  from  (S,B„)  to  (R,B(R)), 

is  noncleerwasing  in  B.  Let  a = sup  h(s)  and  b = inf  h(s). 

S~  S + 

Then  b-a  £ 0 by  MLR/B  and 
E <h  | « • ) - E(h  1 4>) 

> 3 [ F ( .S  “ | <^  * ) - F (S~  I 4> ) ] + b[F(S+|«')  - F ( 5 + | o ) ] 

= (b-a)  [F  (5+  | <|> 1 ) - F (S+  | <}> ) ] > 0. 

This  is  true  for  all  B.  Hence  TMLR^>TSMP. 

Theorem  2.3.2.  Under  the  symmetric  setup  of  Section  2.2, 
1MLR  -->  Property  M. 

Proof  . Suppose  {f(*,»  |6rt^i)  ; (G,i^)c<J>)  has  TMLR . Suppose 

(x,y)cS,  ( 0 , >' ) c <?  , X = (x.,x_,...,  x ) , G = (0.  , 0,  , . . . , 0 ) . 

I 2{.}  p {i}  1 * P 

Clearly  xi  £ x ^ , CK  £ G^  x £ (i,j)x,  0 £ (i,j)0.  By 
MLR/ti}  we  have  f(x,yj<i’,iJdf((i,j)x,y|(i,j)G,0)  1 
f ( ( i , j ) x , y | o , 'I ) f ( X , y I ( i , j ) G , ip ) or 

f2  (x,y  ! 0,>M  1 f2  (x,y  | (i,  j ) 6 , ij>)  . 

This  completes  the  proof. 


Examples  of  Families  of  Densities  Having  TMLR 

We  first  make  the  easy  observation  that  if  f(x|C)  = 

P P P 

II  f.(x.|0.)  and  X = II  X.,  0 = 110.,  then  TMLR  is  equiva- 

i=l  111  i=l  1 i=l  1 

lent  to  univariate  MLR,  that  is,  {f(*|0):GcO}  has  TMLR 
if  and  only  if  for  each  i,  { f ^ ( • | Q ^) : 6 ^eO ^ } has  univariate 
MLR.  In  addition,  we  have  the  following 


V 


Theorem  2.3.3.  Any  family  of  distribution.':  whose 
densities  are  of  the  form 

(2.3.1)  C(0,i H exp  [ / Q . ( 0 . , if)  x . ] g ( ; , y ) h (x  , y ) 

i-].  1 1 

where  each  Q.  for  fixed  if  is  nondecreas  :i  no  in  0.  has  TMLR. 

1 l 

Proof . Suppose  (x,y)  , (x',y)rS;  (0,:f),  ( u ' , if ) t 0 ; 

B B 

x £ x'  and  0 _<  0 ' . We  need  to  show  that 

foj  rty)  x • + f Q.  (6!  , 1 > 
i = l 1 i=l  11  1 ~ 

?Qi(Oi.'fr)x[  + ?Q.  (O', 

i=l 1 1 -1  i-l1  1 1 

or  equivalently 

f IQ,  Of  , ip)  ~Q,  (0.  , 4-)  ] (x!  - x.)  > 0 

i=l  1 1 11  ii- 

B B 

But  this  follows  from  x <_  x'  and  0^0'.  B is  arbitrary. 
Hence  the  densities  have  TMLR. 

Example  1.  The  multinomial  density  p 

_ x.  x . T, x. InO . 

/ ^ \ ^ f)  P _ / i*  % i— li  i 

(x,...x  >°i  •••0p  - (x....x  > e 
Ip  Ip 

is  in  the  form  of  (2.3.1.). 

Example  2 . The  Direchlet  density 


Xxi>  *r] 


x -1 
P 


r(x.) ...r(X  ) 1 p 

is  in  the  form  of  (2.3.1.). 

Example  3.  Consider  the  case  of  taking  n observations 

each  from  p independent  normal  populations  with  unknown 

means  and  unknown  but  equal  variances.  The  sufficient  sla- 

— o 

t is  tic  m this  case  i s Xj,  , . . . , x , S , the  samp)  e means  and  the  p-.ob  d 
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estimate  of  the  common  variance.  The  joint  d.-nsity 

! ~ — 5— I -Pfnjlis2 

c(S..2).i-1  ° /n  e 20  h(ir;2 v=2) 


is  in  the  form  of  (2.3.1.) . 


Examples  of  Families  of  Distributions  Having  TSMP 

ij 

We  first  note  that  if  F(x|0)  = IlF(x.|0.)  then  TSMP  re- 

i=]  1 1 

duces  to  univariate  stochastic  ordering,  that  is,  fK(* |6) iOcO) 

has  TSMP  if  and  only  if  for  each  i , { F . ( • If.)  : 0 . c 0 . ) is 

l 1 i i i 

stochastically  ordered  where  0^  is  the  projection  of  0 onto 

the  ith  coordinate.  In  addition,  we  have 

Theorem  2,3.4.  Suppose  X = 0 * r’1>  and  F ( x , y | 9 , ^ ) = 

F (x-0,y|  ip)  # that  is,  F = (F  ( • , • | 0 ,4')  : (0  , <Ji)  cO)  is  a location 

family  of  distributions,  then  F has  TSMP. 

Proof.  Suppose  AeB„  is  nondecreasing  in  B,  (0,^), 

B * 

(0',\J))e4>  and  0 < 0'.  Then  AQ  = { (x,y) : (x+0 ,y) eA}  C 
Afi  , = { (x,y)  : (x-f-0'  ,y)  eA}  and  hence  F(A|0,ip)  = Fo(AQ|C')  £ 


Fo(A0  ' = F(AIQ  ' • 


Examples  of  Families  of  Densities  Having  Property  M 

It  is  easy  to  see  that  under  the  symmetric  setup  of 

P 

Section  2.2,  if  f ( x | 0 ) = 11  f(x.|0.)  then  Property  M is 

i=l  1 1 

equivalent  to  univariate  MLR.  In  addition  the  following 
theorem  concerning  elliptically  contoured fami lies  of  distri- 
butions was  proved  in  Eaton  (1967): 

Theorem  2.3.5.  (Eaton,  1967).  Suppose  { f ( • | 0) : 6eRF} 
are  densities  with  respect  to  the  Lebesque  measure  on  K‘‘ . 
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Suppose  further 

f ( x | 0 ) - C ( A ) g [ ( x-G  ) A ( — 0 )'] 

where  A is  a pxp  positive  definite  matrix,  y is  strictly  de- 
creasing, and  C(A)  is  a positive'  constant  . Then  the  follow- 
ing are  equivalent: 

(i)  { f ( • | 0 ) : Oe R^}  has  Property  M; 

(ii)  A = C1I  - C2  I'l,  C}  > 0,  -«  <.  C ?/Cl  ' 1/p. 

Clearly  in  the  symmetric  setup  there  are  families  of. 

distributions  having  TSMP  but  not  Property  M since  univari- 
ate stochastic  ordering  does  not  imply  univar;ato  MLR.  On 
tiie  other  hand  Property  M does  not  imply  TSMP  either.  The 
following  example  shows  this. 


This  family  of  four  distributions  clearly  has  Property 
M.  But  it  does  not  have  TSMP.  For  instance,  P{(5,6)  j (1,2) > 
= 0.9  > p((5,6)  | (3,4)  ) = 0.6. 

The  following  diagram  summarizes  the  rel alionship 
between  TMLR,  TSMP  and  Property  M. 


TSMP 

TMLR^ 

Property  M 


1M — ■ 


Tho  following  is  a generalization  o a i . 1 T ;.n 
Lehmann  (1955)  . 

Theorem  2 . 3.6.  Suppose  F(*|$)  is  s toefcc* : • to!'.'.  . i.  llcr 
than  F ( • j 4> 1 ) in  B.  Then  for  any  measurable  f;  in.  . .Lai 

is  nondecreasing  in  B,  if  E [h(X,Y)J  and  i:^  , (h  (;:,  Y)  1 exist, 
then  (h (X, Y) ) < , [h (X, Y) ] . 

Proof . Let  h+  and  h be  the  posit) v and  negative  parts 
of  h respectively.  We  shall  approximate  h+  by  s eg nonce  of 
simple  functions.  Let 


hn(x,y) 


(i-l)/2n  for  (x,y)eS(n) 
n for  (x,y)cSN 


where 

S ^ = { (x,y)  : (i-l)/2n  _<  hH  (x,y)  i/2u), 

i = 1,2,  .. . ,n2n, 

= { (x,y):hf(x,y)  >_  n}  , N = n2n  4 1. 


Then  hn  = j l/2n  (I  (n)  + Ic(n)  + ...  4 I (n)  ) 
n i-2  b .1  bi4l  b N 

l 1/2"  ‘G  s«?> 

j = i J 

4 N , . 

and  h h . Now  for  each  i,  (J  S^n  is  nondecreasing  in  B 
n . i 

4 3 = 1 

since  h is  nondecreasing  in  B.  Hence  E^ [h^ (X, Y) ] <_ 

E,  , [h  ( X, Y) ] . 

<j>  n 

Using  the  Monotone  Convergence  Theorem,  we  have 

E . [h+ (X, Y) ] = lim  E. [h  (X,Y) ] < lim  E,,[h  (X,Y))  = 

^ n -b<x>  v n*^00  “ 

Eq,  [h+ (X, Y) ] . 

Similarly  we  can  prove  that  E . [h  (X,Y)]  < E, , (h  (X,Y)].  Thus 

9 ~ V 

if  E (h  ( X , Y)  ] and  E,  # (h  (X,  Y)  ] exist  then  E . Th  (X,  Y)  ] F. . , [h  (X,  Y) ] 

<P  (J)  (])  — (f 


mmm ■ i /- — — — 


30 


The  result  that  we  will  actually  use  .is  contained  in  the 
following  corollary. 

Corolla  ry 2_._3.1_.  Suppose  the  family  of  dir . 1 1:  i Lui  ions 

( 1'  ( • | 0 ) •*  4>£  4’ ) has  TSMP . If  h:S  ->  [0,«]  is  measurable  and 
nondecreasing  in  U,  then  K [h  (X,  Y)  ] as  a function  of  C is 
nondecrcasing  in  B.  4 

Even  though  Property  M does  not  imply  TSMP,  it  does 
imply  a sort  of  one-dimensional  stochastic  ordering  which  we 
shall  prove  through  a series  of  lemmas.  So  for  the  remain- 
der of  this  section , wo  assume  the  family  of  densities 
{ f ( • , • | 9 ,ty)  : (0  , i}i)  e<V  > has  Property  M. 

Lemma  2.3.1.  For  fixed  i , j ( l<_i , j<p)  let  C?  - 
{ (x,y)  : (x,y)  eS,  x^  <^x^}.  Then  for  any  set  C C C;?  measur- 
able with  respect  to  B^.  and  any  (0,i p)c4’, 

F(C  | 0 ,ip)  J-F((i,  j)c|0,i jj)  as  0..  | 0^. 


Proof . Suppose 
F(C|  6,tp)  - / 


9 . < 0 . . Then 
i-l 

f (x,y  | 0 fi/0  du  (x,y) 


C 

1 /f  (x,y  | (i,  j)  dp  (x,y) 

c 


= / f (x,y  I (MO  dp  (x,y) 

(i/ j)C 

= F((i,j)c|e,^). 

Similarly  for  G ^ 0... 

Lemma  2.3.2.  Suppose  C C S is  measurable  with  respect 
to  and  nondecreasing  in  {i}.  Then 
F (C | 0 , ijt)  J F(  (i  , j)c|0,ij-)  as  0. 
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Proof.  Lot  c - { (x,y : (x,y) eC, ( (i, j ) x,y) eC> , 

C.  = { (x,y) : (x,y) eC, ( (i, j)x,y)/C)  and  Cj  = { (x,  y)  : ( x,  y)  /.C, 
( (i, j) x,y) eC)  = (i,j)Ci.  Then 

F (C | 0 , I/O  = F(Co|0fi|»)  + F (Ci  | 0 , 1/1)  , 

F ( ( i , j ) C.  | 0 , I/O  = F(Co|0(\p)  + F(Cj  | 0 » ip) . 


But  (x,y)cCj  =4' < x j for  :>  x^  and  ((i,j)>c,y)  cC  would 
imply  (x,y)f.C  since  C is  nondecreasing  in  { i } . Hence  C ^ C 


C^.  Therefore  by  the  previous  lemma 

F(C|0,iJ»)  | F((i,j)C|efl|»)  as  Q±  f 0^. 

Theorem  2.3.7.  Suppose  h:  S ->  [0,°^]  measurable  with  re- 
spect to  is  nondecreasing  in  { i } . Then  Eg  ^[h(X,Y)l  £ 

E0  ,\p  [h(  (i»  j)  X,Y)  ] if  e.  < Gj. 

?E^‘u}9'*,h<(l'J)X'X>1  = E(i.j)0.*[MX'V)1-  BUt 

0i  £ 6 ^ 0 <_  (i,j)G.  Hence  by  Corollary  2.3.1. 

E6,»|h(X'Y)1  i E(i,i)e,tlh<X'Y)1  “ E0,*Ih((i,j)X.Y)l. 


2.4.  Form  of  Bayes  Procedures  Under  The  Symmetric  Setup 
When  The  Densities  Have  Property  M and 
the  Loss  Function  is  Monotone 
The  results  of  this  section  were  essentially  obtained 
in  Goel  and  Rubin  (1975) . We  assume  the  setup  of  the  problem 
is  the  symmetric  setup  of  Section  2.2.  To  refresh  our 
memory,  we  review  briefly  the  setup. 

The  sample  S is  a Borel  subset  of  RP+CJ  invariant  under 
S , the  symmetric  group  of  order  p.  The  associated  o-field 
B<,  is  the  o-field  inherited  from  the  Borel  o-field  on  R*  ^ . 


I-,..  — — . — - 


32 


Tlic.'  associated  o-field  B,  is  the  o-field 
p i 

^ * 


The  pa".'  ti  l er  space  £•  is  a Bored  subset  of  R,Jfr  in 
variunl  untie. 

inherit t d f cue  the  Borel  o-f  ield  on  R 

The  fein:iy  of  possible  distributions  { F ( • , • | 0 , ip)  : 

is  dom Lnated  by  some  o-finite  measure  p.  The  den- 
sities { t ( • , • | 0 , : (0  , ij>)  e4> } and  the  dominating  measure  u 
satisiy  the  following  invariance  assumptions:  For  any  ttcS^ 

f (ux,y|ir0,^)  = f (x,  y | 0 , 1/)) 


d|i(ux,y)  = du(x,y). 

The  action  space  A is  the  set  of  all  nonempty  subsets 
of  K = {l,2,...,p}  together  with  the  power  set  of  A as 
the  associated  o-algebra  E^. 

A decision  procedure  is  a measurable  function  from 
(S, 8$)  to  P(B^).  The  class  of  all  decision  procedures  is 
denoted  by  V.  A procedure  6eP  is  said  to  be  (permutation- 
ally)  invariant  if  and  only  if  6(us,7ra)  = <5  (s , a)  \/s , a . The 
class  of  all  invariant  procedures  in  V is  denoted  by  . 

The  loss  function  L is  a measurable  function  from  (SxA, 
B^xBa)  to  ( [Of00]  , B ( ) ) satisfying  the  invariance  condi- 

tion 

L ( tt  $ , n a ) = L ( <f> , a ) . 

All  the  assumptions  made  in  the  above  setup  are  invari- 
ance assumptions.  We  now  make  in  addition  two  ordering  as- 
sumptions . 

We  assume  that  the  densities  { f (x,y  | 0 , <J')  : (0  , ij<)  i;4>} 
have  Property  M as  defined  in  the  previous  section.  That  is, 


*-  ' - * M .1..: 


for  (x,y )cS 


/ ( 0 , t ) c ■ « , 


0 . > 

1 


X - 

0 . f ( x , y 1 1> 


xp)  ' 0 (,J1*  °2'  * • • ''V 

> f(x,y|  ( i , j ) • , ; ) - 


We  assume  the  loss  function  I.  in  addition  to  being 
invariant  has  the  fol  lowing  monoton  icity  property : 

For  (0,ij'H4,  0 = (0lf0>2,  . . . ,0  ) , 0i  _>  6j,  ica,  j^a 

i mp.ly 


L((0,ii/),a)  <_  L ( ( 0 , <j» ) , ( i , j ) a ) . 


For  each  aeA  let  B = { (x,y) : (x,y) eS,  x.  > x for  all 

a x — j 

ira , j/a  }. 

For  each  P,(x,y)  and  a consider  r.  (x,y)  defined  by 

a 

r_(x,y)  = /L{  (0,^)  ,a)  f (x,y  J G7i/<)dP  (0,  y)  . 
a 4 

Note  that  r (x,y)  is  proportional  to  the  posterior  risk 

a 

B (a | x,y) . 

Lemma  2.4.1.  (Eaton,  1967).  Under  the  assumptions 
made  in  this  section,  for  any  acA,  ica  and  jfa  imply 

Vx'y)  i r(i,j)a(x'y)Vl>:-y,t:Ba- 

Proof . Let  4^  = { (0,i//)  : (0,i//)c4>,  = 0 x ) , 4^  = ( (fh  v) 

(0,y)c4,  0.  > G . } , 4 = ( ( 0 # ) : (0,VOet,  0 . < 0 . Then 

± "J  Z J-  J 

(2-4a)  r(i.j)a(>!'y)  - ra(x’y) 

2 

= l / (L(  (0 ,4»)  , (i,j)a)  - L ( ( 0 , i|- ) , a ) ] 
m=0  4 

m 

f (x,y  | 0 , 4)  dP  (0  , i/d  . 

The  invariance  assumptions  imply  that  L ( ( 0 , , ) , ( i , j ) a ) 

= L((0,^),a)  for  (0,4)e:4^  and 


L 


h 


I [I.(  (0,t>)  , (i,  j)a)  - L(  (0,«M  ,a)  ]f  (x,y  | 0 , ^)  dP  (0  , s' 


= / IM  (0fip)  ,a)  - L(  (OfVp),  (i/j)a)  ] 


f(x,y|  (i,j)O,Tp)dP(0,^)  . 


Thus  we  can  write  (2,4.1)  as 


(2*4*2)  r(i,j)a(x'y)  - ra(x'y) 


= /[L((0,^),(i,j)a)  - L ( (6  ,tjj)  , a)  ] 


[ f ( x , y | 0 , \p ) - f ( x , y | ( i , j ) 0 , iM  ) dP  ( n , ^ ) . 


(0  ,![■)  , (x,y)eBa  imply  f (x , y | 0 ,<p ) -f  (x,  y | ( i , j ) 0 , <1 ) 


>_  0.  Also  (0  ,t[)  c <J>  ^ , iea,  j/a  imply  L((G,\f>)#(i,j)a)  - 


L((0,i{j),a)  > 0.  Hence  (2.4,2)  > 0 and  (2.4.1)  > 0.  This 


completes  the  proof. 


For  each  sr.S,  let  II  (s)  = {a:aeA,suB  , lal  = m)  where  !a 

m ci  1 

p 

denotes  the  number  of  elements  in  a and  II  (s)  = (J  II  (s)  . 

m=l  111 


Theorem  2.4.1.  Under  the  assumptions  made  in  this 


section,  for  any  non-negative  measure  P on  ( , B . ) which  is 


invariant  under  S , a sufficient  condition  for  a procedure  ; 

P 


to  be  Laplace  for  P relative  to  P is 


6 * ( s , T ( s ) ) = 1 


where 


T(s)  = {a  : acll  ( s)  , B (a  | s'  - min  B(a|s)}. 

aril  ( s) 


Proof.  By  Theorem  1.1  a sufficient  condition  for  6*  to 


be  Laplace  is  6*(s,S(s))  = 1 where  S(s)  - {a :acA , B (a | s) 


min  B(c«|s)).  By  tlie  previous  lemma  at  II  ( ) ' B(a!  ) 

or  A m 


- ' w.  4 * 


n\  in  h(..|r).  H.  : o b(u|s)  = min  13  ( a | s ) -^R(a|.s)  = 

t c*  [ -=ri  c.iH(s) 

inin  l>  ( ' |c)  , • :»ci  f is,  "(  ) C S(s)  . 
a c A 


2.0.  An  Intuitive  Loss  Function 
In  inl.(  .r< ‘ i ii.j  subset  selection  procedures  as  screen- 
ing proctdnr  , v.  want  t he  select  ed  subset  Lo  contain  thr 
true  ’ 1."  i 1 , but  Jo  not  want  the  subset  to  be  too  large. 
It  S'  c:  i asonabJo  then  any  symmetric  loss  function  should 

contain  at  lo.-: a the  following  t hree  components : 


1.  ii  .on  i Selection  (ICS)  = 1 - V i 

0 . -max 
ica  i . . 

2.  The  si^o  of  the  selected  subset  a,  denoted  by  |a|; 

3.  Some  measure  of  the  average  'goodness'  of  the 


0 . 
3 


selected 

set,  e. 

g . max 

o,  - l 

3 = 1 , • - 

. ,p  J iea 

The 

quant i t 

y max 

0 . --  max 
,p  -1  ica 

1=1»  • • • 

a combination  of  1.  and  3. 

Traditionally  ICS  and  |a|  have  received  the  rest  atten- 
tion. Goel  and  Rubin  (1975)  considered  c (max  <■  . - 

j=l, ... ,p  1 

max  0.)  + c»  J a { and  using  essentially  Theorem  2.3.7  ob- 
i r.  a 

tained  results  concerning  the  form  of  Bayes  procedures. 

Chernoff  and  Yahav  (1977)  considered  c1  (max  0..  - 

3=1 P 1 

max  Cm)  + (max  6 . - £ 0 ^ / j a | ) and  performed  Monte 

Carlo  studies  assuming  normal  populations  and  normal  ex- 
changeable priors.  They  found  that  in  terms  of  Bayes  risk 
Gupta  type  procedures  are  extremely  good  compared  to  Puycs 
procedures  but  could  not  offer  any  explanation  ns  to  why  t! 


>«k  - — - — 


A 
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ia  50.  Bickel  and  Yahav  (1977)  oonr.idoro..  p,  (id:)  t 

1 

c2  (iw.  •:  0 . - )'  0./|a|)  and  : tilled  pi,,:  ic 

j~l,  . . . ,p  J it  a 

behavior  of  Bayes  procedures  as  p ► assr.i  j i ?:«)  non  il  popu- 
lations . 

As  it  turns  out  the  form  of  the  Bayes  proc<  lures  ir; 
fairly  sensitive  to  what  is  used  for  3.  As  '.hr  inclusion 
of  the  components  1.  and  2.  seem  to  be  in  general  agreement, 
it  seems  reasonable  to  consider  loss  functions  oi  the  form 
b(<f>,a)  C]  ( ICS)  + c?  | a | . 

Some  care  must  be  taken  in  defining  JOS  wh>  n several  erf 
the  O's  arc  tied  for  the  maximum.  Traditionally  when  this 
happens  one  of  the  O's  tied  for  the  maximum  is  arbitrarily 
'tagged'  as  the  'best'.  This  makes  P.  , {ICS}  a continuous 
function  of  (0,4')  in  the  usual  case,  namely,  when  the  setup 
of  the  problem  is  symmetric,  the  procedure  considered  i; 
permuta tionally  invariant,  and  the  family  of  possible  distri- 
butions is  in  the  exponential  family  of  distributions.  Tn 
this  thesis  we  shall  use  a slightly  different  definition  of 
ICS  which  is  equivalent  to  the  traditional  definition  (in 
terms  of  risk  functions)  in  the  usual  case  described  above 


but  not  equivalent  in  general. 

Definition  2.5.1.  XCS($,a)  = 1 

where 


l 

ica 


P 

l ( 1/m)  I g,  (*) 

m=l  * im 


41 . = f <f> : 4>e  , 

im  Y ’ 


j=l , • • 


{0  --max 
a . 
rl. 


Remark  2.5.].  In  words,  is  the  subset  of  1 v.  re 

i:  ! i.'il  \.i  til  in— 1 otlicr  0's  as  the  ma  xi  mum. 

To  fir  ideas,  suppose  three  of  the  ( ' s are  bird  for 
l.  i . i r • ’ i r.r , if  'a'  selects  two  of  the  three,  then  It'd  ( ; ,a) 


- 1/3- 


Wo  shall  use  throughout  the  thesis  the  following  short- 


hand  notation. 


Notation.  I 


{ 0 . = max  0 . 

j^l , . . . ,p  -1 


,(4)  = y (l/in)J.  (■;) 
i ; m=  1 s i m 


Note  that  defined  this  way  I 


{ 6 . -max  0 . } 

1 j = 1 , . . . , p 1 


Wo  now  describe  a Bayes  procedure. 

Theorem  2.5.1.  If  the  loss  function  is  c^(ICB)  h c2j  aj 
then  for  any  prior,  the  procedure 
6 , ( s , a * ) - 1 


a * = { i : P { 0 . = max 


e . 1 s > •>  c /c.  ) if  a*  J 4 

. 1 Z L 


where 


61(s,{i})  - 1/N  (s) 


P { 0 . = max 


j — 1 , . . . , p 


0 • I s } = max 


P { 0 = max 


m=  1 , . . . , p 


j=l , • • • .P 


N(s)  . \ *{?{  0 . =max 

1 "*  i l*i 

j — 1 , • • • , P 


0 j I s } = 


P{0  =■  max 


m- 1 , . . • , p 


j*l, . . . ,P 


0 . | s } } (s)f 

I 1 


if  a*  = 


is  Bayes. 


I - 4 
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Proof.  The  posterior  risk.  B(a|s)  - c^(l-  )"  P{0, 


max  0^  | s } ] + c2 | a 


it  a 

The  result  follows  from  Theorem  LI. 


j-1  , • • • ,P 

Corollary  2.5.1.  If  c2^c\  — 1/p  then  reduces  to 


62(s,a*)  = 1 


where 


a*  = { i :P{ 0 . = max  0 - ( s } > c-  /c  } 

j=l ,p  3 1 1 


Proof . 


\ P{ 0 . = max 
i-1  1 i=l 


0 . Is}  = 1, 


j-1, ... ,p 


so 


c2/cl  - 1//p  ^ a*  ^ 

Suppose  the  setup  of  the  problem  is  symmetric,  then  6^ 
is  permutationally  invariant  for  any  exchangeable  prior.  In  do 
scribing  6^  one  can  therefore  assume  for  any  s (x^ , 
x2,  . . . ,Xp,y1,y2,  . . . ,yq)  that  ^ i x2  <_  . . . £ x . 

Corollary  2.5.2.  Suppose  the  densities  f f (• , • | 0,^) : 

(0  r\l>)c<i>}  have  Property  M and  the  prior  is  exchangeable.  1’or 
s = (x^ , x2 , . . . , Xp, y^ ,y2 , . . . ,y  ) assume  without  loss  of  gener- 
ality x.  < x-  < . . . < x 

2 1 — 2 — — p 


Then  can  be  written  as 


<52(s,a*)  = 1 where  a*  = {i:i>i*}  if  i*  < p, 
6 3 ( s , { i } ) = 1/N ( x) 


for  each  i such  that  x.  = x , N(x) 

i p 


J.hx.-x  ) n i*  ' p 
3-1  3 P 


where 


i*  = largest  integer  i such  that  P { 0 . - max  0 . 

1 M r-  1 

' W 


S ] 


w * - * L*  T -»i  *' 


-Sj  r 
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Proof.  This  follows  from  the  fact  that  the  posterior 

densities  with  respect  to  the  prior  have  Property  M and  so 

by  Theorem  2.3.7.  x.  £ x.  =^P{0.  = max  0-[s}  £ 

1 3 1 j=l, . - .,P  3 


P { 0 . = max 


J 


3*1, 


0 j I s) . 


J 


2.6,  Uniqueness  of  Bayes  Procedures 
The  following  gives  a sufficient  condition  for  the 
uniqueness  of  Bayes  procedures. 

Theorem  2.6.1.  Suppose  that  the  setup  of  the  problem  i: 
the  symmetric  setup  of  Section  2.2,  f(x,y|0,ij')  = C(0,<j>) 


V-  ? 

exp  [ ) Q(3.,ij;)x.  + / K . (ti>)  y ■ ]h  (x,y)  , and  the  dominatin 

1 1 j = ] 3 3 


i=l 


measure  is  the  Lebesoue  measure  on  R 


p+cj 


•9 


Suppose  also  the 


loss  function  is  c^(ICS)  + C£ | a | with  c^/c^  < 1/p.  Then  for 
any  exchangeable  prior,  is  the  unique  Bayes  procedure. 

Proof . For  any  SeB^,  FtSl^l^O  for  some  implies 

F(S|<{»)=0  for  all  Hence  for  any  prior,  F (•  | <$)  is  absolute- 

ly continuous  with  respect  to  n for  every  <fu  $'•  Since  the 
loss  function  is  bounded,  for  any  prior  P and  any  6t:P,  B(P,6) 

< «>.  Hence  for  Corollary  1.1  to  apply  we  need  to  show  AMs) 
as  defined  in  (1.1)  consists  of  a single  point  of  A a.e.  II . 
Without  loss  of  generality  assume  h(s)  >0  for  all  scS.  The 
posterior  density  with  respect  to  the  prior  has  the  form 

C ( 0 , ’J;)  exp  [ f Q.  (0,\J>)x.  + \ R . ( 0 , <J>)  y . ] h ’ (x  ,y ) 

i=l  1 j = l 3 3 

and  S is  in  the  natural  parameter  space  of  the  posterior. 

Consider  the  set  E of  (real)  solutions  of  PfO.  - max 

1 j-1 ,P 

0.1s)  = c,./c.  in  D where  D is  the  interior  of  the  natural 
j 1 2 1 


»•  *•  Ji 
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parameter  space.  For  fixed  (x2 , . . . , x^y^  , . . . , y ) , 

P{0.  = max  0.|s}  is  analytic  in  x.  . Likewise  for  every 

3=1 P D 1 

other  component  of  (x,y) , This  fact  implies  y(E)  = 0 or 

E = D.  We  will  show  this  for  p=2,  q=0.  The  proof  for  the 

general  case  follows  by  induction. 

Let  and  E^  be  the  projections  of  D and  E onto  the 

first  axis  respectively.  For  each  x.eD.,  let  D = 

J-  -L 

{x2  : (x^  , x2)  eD}  find  = (x2 : (x^,x2) eE} . If  t (E)  > 0,  then 

there  exist  e>0  and  E'  C E.  such  that  m(E()>0  and  )i{E  )>g 

1-1  1 X1 

for  each  x,  c E' . In  particular  for  each  x.cEi,  E is 
11  1 1 x^ 

uncountable  and  hence  is  the  whole  of  D . But  then  there 

X1 

exists  an  interval  (a,b)  such  that  for  each  x2c(a,b),  the 
set  fx^ : (x-^ , x,J  rE}  is  uncountable  for  otherwise  would 
be  countable.  Hence  { (x^ , x2 ) : (x^ , x2 ) gD,  x2c(a,b)}  CE. 

By  analytic  continuation  we  have  D = E. 

Suppose  D = E.  Then  P{0..  = max  0.|s)  = c2/c^ 

j “1/ • • • rP 

a.e.  IT.  By  symmetry  P{0.  = max  0 . | s } = c_/c1  a.e.  TI 

for  each  i,  i=l,...,p.  But  then  \ p{0.  = max  0 . 1 s > = 

i=l  1 j=l , . . . ,p  J 

p(c2/c^)  < 1 a.e.  IT.  Contradiction.  Hence  ij  (E)  = 0 
which  implies  A*(s)  as  defined  in  (1.1)  consists  of  a 
single  point  of  A a.e.  Tl. 

The  following  figure  might  clarify  the  proof. 


■■■  - 
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Figure  1.  The  Sets  D and  I’. 

2.7.  Form  Of  Bayer.  Procedures  When  The  F.  jis  i t - • s 

Have  TMLR  and  the  Loss  Hnic t 1 on  _is  _c ^ JJT CH ) h__c r j . ; 

We  have  seen  earlier  that  the  concept  of  Property  M is 
important  in  the  study  of  Bayes  procedures  partially  oocause 
the  densities  of  (x,y)  given  (0,i JO  having  Property  1!  implies 
for  any  exchangeable  prior  P the  posterior  dunsii  Los  of 
( 0 , if, ) given  (x,y)  with  respect  to  P have  Property  M and  hence 
the  posterior  distributions  have  the  one-dimensional  stoch- 
astic monotone  property  stated  in  Theorem  2.3.7.  Likewise, 
the  concept  of  Total  Monotone  Likelihood  Ratio  (TMLR)  is 
important  because  the  densities  of  (x,y)  given  (0fy)  having 
TMLR  implies  for  any  prior  P the  posterior  densities  of  (0,v) 
given  (x,y)  wi th  respect  to  P have  TMLR  and  hence  the  poster- 
ior distributions  have  Total  Stochastic  Monotone  Property 


I 


*1 

* 

* 


4 2 

('J'S '■-!  ) . W • > • fact  formally. 

Lor  mia  2.7.1  S,  ; L in  ■ densities  { f (>:,y  j •..',</<)  s ( C , %•  ) * <* } 

have  TMLH . Then  . r-r  y pr  lor  P the  posterior  densities 

°f  given  (:  ,y ) \ i ..h  respect  to  P have  TMPR  and  hence  1 

the  posterior  dir'v  i).  . ions  hare  TSMP. 

Proof . The  posterior  densities  with  respect  to  F are 

{ f (x,  y | 0 , t,’>) // f (x,  y ! •' , y)  dP  ( o , i/i)  .*  (x,  y ) cS}  a.c.  II.  They  have 
(»* 

TMI.R  because1  of  the  symmetry  in  (x,y)  and  (0,if/)  in  the 

definition  of  TMLH . The  rest  follows  from  Corollary  2.3.1. 

Theorem  2.7.1.  Suppose  the  densities  { f (x , y | ' , ) : 

( 0 , ip ) c ❖ } have  TMLH  and  the  loss  function  L(<j>,a)  - c^(lCS) 

+ c2  ! a | is  such  that  c^/c^  <_  1/p.  Then  the  nonrandomi  zed 

Bayes  Procedure  6^  defined  in  Section  2.5:  <5p(s,a*)  = 1 

where  a*  = { i : P { n . -max  O.js)  > c,/c. ) in  monotone. 

j=l, P 3 1 

1 f (j  r,,nay  e i is  /{i}.  By  the  previous 

1 j^l , . . . ,p  ^ 

lemma  and  Corollary  2.3.1,  P{0.=max  e.jx,y}  is  /{ i ) . 

1 j=l,...,P  3 

Hence  5 2 is  monotone. 

However,  if  the  setup  of  the  problem  is  symmetric,  then 
the  condition  that  c^/c^  1/p  is  not  needed. 

Theorem  2.7.2.  Suppose  the.  setup  of  the  problem  is 
the  symmetric  setup  of  Section  2.2,  the  densities  {f  (x,y  | C , i|i): 

(0,i/i)e<5’}  have  TMLH  and  the  loss  function  is  c^(TCS)  + c 2 | >3 1 - 
Then  for  any  exchangeable  prior,  the  Bayes  procedure  6^ 
defined  in  Section  2.5, 

6_(s,a*)  --  1 where  a*  = f i : P { 0 . r max  0 . | s } > 

1 j-1, . . . ,p  3 

C2/C ] ) if  a*  / < ' 


/ Xn  • ■ 


J 


*3(s#fi>)  “ 1/N  (;■:) 


where 


x . = max 


v-  max  x.,  N (x)  = }'  Jr 

2 -i P 3 i=i  <*rmax 


j=i * • • • ,p 


if  a * = 4> 


is  monotone. 


We  have  the  following  trivial 

Corollary  2.7.1.  If  the  hypothesis  of  Theorem  2.7.]. 
(Theorem  2.7.2)  is  satisfied,  and  if  for  a prior  (exchange- 
able prior)  there  is  a unique  Bayes  procedure,  then  that 
Bayes  procedure  is  monotone. 

We  are  now  in  a position  to  prove  the  following  essen- 
tially complete  class  theorem. 

Thcor eni  2.7.3.  Suppose  the  setup  of  the  problem  is  the 

symmetric  setup  of  Section  2.2,  f(x,y|0,ijd  ~ C(8,t i>)  exp 
p q 

( l Q(0.,t|/)x.+  } R • < ^ ) y . ] h (x  ,y)  where,  for  fixed  1 1 , Q is 
i=l  1 1 j=l  J J 

nondecreasi ng  in  0.,  and  the  dominating  measure  n is  the 
Lebesque  measure  on  Suppose  also  the  loss  function  is 

c^(ICS)  + Cgfal  with  02/0^  < 1/p.  Then  relative  to  V , the 
class  of  monotone  invariant  procedures  forms  an  essentially 
complete  class. 

Proof . Since  the  setup  of  the  problem  is  symmetric, 
Theorem  2.2.1.  implies  the  closure  of  Bayes  procedures  rela- 
tive to  V for  exchangeable  priors  forms  an  essentially  com- 
plete class  relative  to  . By  Theorem  2.6.1,  for  each  ex- 
changeable prior,  62  is  the  unique  Bayes  procedure.  ^ is 
of  course  invariant.  By  Theorem  2.  :.  i.  {((•,•  | ,,):(  ,:) 


lias  TMI..H.  So  by  Theorem  2.7.1,  6^  is  monotone. 


[’has  it 

remains  to  prove  that  limits  of  monotone  procedures  are 
monotone . 

Suppose  a net  { 6 1 } of  monotone  procedures  converges  to 

6.  Suppose  that  <“>  is  not  monotone,  that  is,  there  exists  i,  S., 

~~{i}  1 

S0tS,  u(S2)  > 0,  s^_<  s2  for  all  s^c-S^  , 52t-<'2/  and 

ess  sup  6.(s)  > ess  inf  6. (s').  It  is  easy  to  check  that 
seS^  s'cS 2 

this  implies  there  exists  Q W?  C , 0 •'  |i(W^), 

P ( W 2 ) < t,J  suc}l  that  6 ^ ( s 1 ) > 6i(s2)  + c some  l>0  for  all 


slea,l'  s2ca;2- 


Now  [ /<$*  (s)  du  ( s)  ] n (W_)  < ess  sup  i‘‘(s)|  (ft') 

Wi  1 ~ si  W]  1 1 

< ess  inf  6?  (s)  \i  (ft1  ) u (W„)  <[  /S ” ( s)  dr  (s)  ] p (ft*  ) . Fince 

^ ^ M r 1 1 


li  (Wo 


scut 


w2 

t a 


/<V.  (s)dp(s)  ■*  /6.  (s)dp(s)  and  / 6 . ( s ) d 1 1 ( s; ) - {'■(  s)d;(s) 

W1  W1  W2  1 W2 
we  have  [ j6  . (s)  dp  (s)  ] p (W  ) < [ / <5 . (s)  dp  (s)  ] p ((<’  ) . But 
W x * ~ >•<  1 J 


1 


W, 


6.(s-.)  > 6.(3-)  + e for  all  s.r.W.  , s _ e W 0 _>  ess  inf  6.(s) 

S L 0 j 

ess  sup  S (s)  + c.  Hence  [ /6  . (s)dp  (s)  J p (IiL)  >•  css  inf 

„ ^ M 1 1,1  1 ^ ^ 1 


(s) 


sett’ 


y(W  )V'(W9)  (ess  sup  6.(s)  4 c ] p (W  ) p (to,)  > ( / 6 . ( s)  dp  ( s)  ] 

1 ^ scW2  it-  ft,2  3 

p(W^)  + ep  (U^) y (W2)  . Contradiction.  Therefore  6 is  monotone 
and  the  proof  is  complete. 

Our  essentially  complete  class  theorem  is  not  entirely 
satisfactory  in  two  ways.  First,  permutations]  invariance 
was  used.  The  theorem  ought,  to  be  true  even  if  the  setup  is 
not  symmetric.  Two,  the  theorem  is  an  essentially  complete 
class  theorem  rather  than  a complete  class  theorem.  Dy  us- 
ing the  strict  versions  of  Total  Monotone  Like!  ihooci  Hat  io 


A 


and  Total  Stochastic  Monotone  Property  it  r.ia ^ 
qi  vc'  a constructive  proof  of  a complete  class 


be  po: 

I hcorc  1 1 . 


this  connection  see  Brown,  Cohen  and  Strawdermun  (197 < 


CHAPTER  3 


SIMULATION  RESULTS  ON  SUBSET  SELECTION  PROCEDURES 

in  the  last  two  chapters  our  search  for  procedures  that 
perform  well  on  the  average  led  us  to  the  investigation  of 

Bayes  procedures.  But  Bayes  procedures  typically  require 
numerical  integration  to  implement  and  sometimes  this  makes 
them  unsuitable  for  practical  use.  Besides,  the  use  of 
Bayes  procedures  is  by  no  means  universally  accepted.  So 
if  there  are  easy  to  use  classical  procedures  that  have  per- 
formance close  to  those  of  Bayes  procedures,  then  these 
classical  procedures  ought  to  be  used.  This  possibility  is 
explored  in  this  chapter  for  the  case  of  normal  populations 
with  normal  exchangeable  priors.  The  classical  procedures 
of  Gupta  type  and  of  Seal  type  are  compared  with  Bayes  pro- 
cedures in  terms  of  integrated  risks.  Though  the  Monte 
Carlo  studies  were  done  for  the  case  p=8  only,  indications 
are  for  each  loss  function  and  prior  pair,  there  is  always 
a procedure  of  Gupta  type  that  performs  almost,  as  well  as 
the  corresponding  Bayes  procedure,  while  this  is  true  for 
Seal  type  procedures  only  when  the  normal  prior  is  very  in- 
formative compared  to  the  observations.  In  this  connection 
Chernoff  and  Yahav  (1977)  earlier  made  similar  studies  for 

the  loss  functions  c^ (max  O.-max  0.)  •)  c^  (max  0- 

1 j“  1 , • • -P  J iea  1 j=l , . . . ,p  D 

l 0./|a[).  They  also  found  that  Bayes  procedures  can  bt 
is  a 


« / Kn  v. 


tot',  Erti 


approximated  closely  by  Gupta  type  procedures.  Though  we 
do  not  yet  know  whether  Gupta  type  procedures  are  robust 
against  priors,  we  can  recommend  their  use  in  the  case  ol 
normal  populations  as  procedures  that  have  at  least,  some 
optimality  properties. 

Notation.  In  this  chapter  as  well  as  the  next,  we 


shall  adopt  the  following  convention.  Lot  0M1  < ...  < 0. 

be  the  ordered  components  of  0.  If  there  is  exactly  one  •. 

such  that  0 . = 0 . , then  we  shall  denote  0 . - 0 , . by 
i IP  1 i [p] 


0 . 

3 


If  more  than  one  6.  are  tied  for  0 , , 

i IP 


, then 
For  tiny 


max 

j -1 » • • • r P 

eaactly  one  of  these  G.  is  tagged  as  max  0. 

j = 1 / • • • ,P  3 

subset  selection  procedure  R,  let  Pq(CS|r3  denote  the 
probability  of  a correct  seJection  under  0 when  procedure 
R is  used.  More  precisely,  if  6(s, a)  is  the  probability 
assigned  to  the  subset  a of  {l,...,p}  by  R when  s is  ob- 
served, then  Pg{CS|R}  is  the  expected  value  of 


l I 


ac  A 


{ max 


0 . c { 0 . , it  a } } 


6(s,a)  under  0,  where  I denotes 


the  indicator  function.  Also  we  shall  denote  by  Eg(SjR)  the 

expected  subset  size  under  0 when  the  procedure  R is  used. 

If  we  let  Pb(i|R)  be  the  probability  of  selecting  l under 

0 when  procedure  R is  used,  that  is,  P ^ ( i | R)  is  the  expected 

value  of  } I,.  ,6(s,a),  then  it  is  easy  to  see  that  E (S|R) 

L . liea)  o 

P as  A 

= }.  P0(i|R)  • 

i=l 

Consider  the  following  model: 

(X|li)  - N (|i , I)  , 

where  X-  ( Xj  , . . . , X ) , p=  (Vj_ , • , • , Pp)  are  vectors  in  and 


i2 i 


t k.  ; ■■  fef  y-r*-  '•*  ’ ' 


I is  the  pxp  identity  matrix, 

p ~ N(ml,  rl  + sU)  where  m,  r,  s are  constants,  r 0, 
-r/p  < s < r , and  U = 1 ' 1 where  1 = (!,...,]). 

The  above  model  is  equivalent  to 


(X,  |i)  N ( (ml  ,ml ) , ( 


(l+r)I  + sU  rl  + s'J 


rl  + sU  rl  + sir 


ilencc-  a posteriori 
( U | X ) - N(p,>:22  1 ) v/here 

p = ml  + (rl  + sU)  I(l+r)I  + sU]  * (x-ml) 
= (r/l+r)  x + a multiple  of  1 


'22.1 


- (rl  + sU)  - (rl  + sU)  [(l  + r)I  + sU]  (rl  + sU) 

_s__ 

-1  ] -f  Y~ 

= (rl  + sU)  - (rl  + sU)  ( 1+r ) [1-  — — U)  (rH  sU) 

i+pife 

= rl  - (r^/l+r) I + a multiple  of  U 


= (r/l+r) I + a multiple  of  U. 

Rayes  P rocedure . 

Recall  that  the  Bayes  procedure  , denoted  by  R^,  is  as 

follows:  Select  i if  and  only  if  X.  = max  X and/or 

i j 

:=i, . . . ,p 

P{p^=max  p . | X ) >_  c^/c^. 

j *lf»««rP 

Gupta's  Procedure. 

The  classical  procedure  studied  in  Gupta  (1956,1965), 


denoted  by  R , is  as  follows:  Select  i if  and  only  if 

VJ 

X.  > max  X.  - d„(P*)  where  d„(P*)  is  just  large  enough  such 

1 — . j • "1  VJ  (j 

j/i 

that  inf  P (CS|R„)  > P*  where  P*(l/p  < P*  < 1)  is  pre- 
prRP  11  G “ 

detei:  mi  ned  . 


•TR.  - * K.  -C*'1 v* 


Seal 1 s Procedure . 

One  particular  procedure  in  the  class  of  piece Pur- 
stud  iod  in  Seal  (1955),  denoted  by  Rc, , is  as  folic. 
Select  i if  and  only  if  X.  > £ x./(p-l)  - dc(I-*) 

1 j/i  J 

where  d (P*)  is  just  large  enough  such  that  inf  P ( CS | R 

fa  . Rp  11 

P*  where  P*(l/p  < P*  < 1)  is  pre-determined. 


S 


We  shall  try  to  show  that  both  Gupta  type  procedures 

and  Seal  type  procedures  are  intuitive  first  approximation 

to  Bayes  procedures.  First  we  state  a well  known  Icum. 

Lemma  3.1.  Suppose  V N(b,  I + r U ) , - 1/p  < ,r  ' 1 

lxp  lxp  pxp  pxp 
Then  there  exist  X and  a such  that  if 


W = (WQ,W1 ,W  ) -N  ( ( 0 , b)  , 


IX  ...  X' 

X 

I 


pxp  , then  (K  -aV.'0  , . . . , 


Kp-aW^)  has  the  same  distribution  as  V. 

As  a consequence  of  this  lemma,  the  expected  value  of 
any  measurable  translation  invariant  function  of  V can  be 
computed  under  V N(b,I)  rather  than  V ~ N(b,I+6u).  We 
prove  this  formally. 

Theorem  _3_._1 . Suppose  V ~ N (b,  1+6 u)  , -1/p  < <5  < 1 . If 
h is  measurable  and  translation  invariant,  that  is,  h(vicl) 
- h(v)  for  all  ccR,  then 

/,ph(v),K'b,H'6u<v)  = Lh(v|dfb,I(vl 

J\  K 

where  <i>  denotes  the  normal  cdf  with  mean  vector  a and 

cl  f L’ 

variance-covariance  matrix  B. 


r % 


Proo  f . 


/ph(v)d*b,I+6  (v) 


; L+ih(wrw0 vwo,(H(o. 


b)  n. . . A 
' * I 


/ p-(- ]_  ^ » • • • i wrJ  d 41 


p'  M0,b)  -IX...  A- 

' ' A I 

• pxp 


= / h(v)d4>  (v)  . 

RP  d,i 

Let  us  examine  the  Bayes  procedure  more  closely.  We 

first  note  that  che  sot  { \i  . - max  p . ] in  both  trans- 

3=1, ... ,p  1 

lation  invariant  and  scale  invariant  in  p . So 

Pip  . = rnax  p ■ | x} 

j-1, . . . ,p  3 

^{p.  = max  )i  ■ ’ ( r/] +r ) x + multiple  of  1, 

j = l P 3 

(r/l+r) I + multiple  of  IT  " ' 


{p.  = max 


j=l, . . . ,p 


j (r/l+r)  x + multiple  of  1, 

(r/l  + r)  I (ll)  by  Thcorem  3 • 1 


"{ p . = max 


( P ) by 


i nMjJ  (r/l+r) x,  (r/l+r) I 

J — x > • • • / P 

translation  invariance 


' { p . = max 


. — niUA  M • J / / -1  , \ A-  f i—  - 

1 H_1  D (r/l+r)  x,I 

J r , . . . , p 


1/2  (p)  by  scale 


invariance 


^ 1 { P i - Pj  + (r/l+r)  1 {x^-x  ■ ) > 0,  j/i  }d<5,0f  I *,l) 


(3.1) 


51 


5 


K 

; 

f, 

* 


Hence  the  Bayes  procedure  selects  i if  and  only  if  th< 

P~1  dimensional  vector  (X^-X^  , j^i ) is  reasonably  large.  7 

a first  approximation  we  may  replace  (X^-X^ , j^i ) by 

(X.-max  X.)  1 or  by  (X.  - l X./(p~l))  1.  Now 
jT'i  3 ~ 1 3 

— ^T{y. -y.+(r/l  +r)  (x.  -max  x . ) , j/i 1 d J 0 , i ^ 

(3.1)  3 1 ^1  3 


- ^{y  -y  +(r/l+r)1/2(x  - j x / (p-1  ) ) , j^i  }d1'0,  i < ,j) 

3 j/i  3 

where  the  first  inequality  is  obvious  and  the  second  in- 
equality follows  from  the  result  of  Marshall  and  Olkin 
(1974).  So  more  realistically  we  would  approximate 

(x. -x.  / j^i)  by  (x.-max  x.+c  )1  and  (x.-  I x./(p-l)-c  )1 

J 1 j^.i  J ~ 1 y/i  1 

where  and  c are  positive  numbers.  In  any  case,  the  first 

approximation  leads  to  the  procedure  of  selecting  i if  and 

only  if  X.-max  X.  is  reasonably  large  which  is  Gupta's 

1 j^i  3 

procedure.  The  second  approximation  leads  to  the  procedure 
of  selecting  i if  and  only  if  X.  - j X./(p-l)  is  reasonably 

j^i  3 

large  which  is  Seal's  procedure. 

Monte  Carlo  studies  were  performed  to  determine  how 
good  these  approximations  are  in  terms  of  integrated  risks. 
The  three  types  of  procedures  being  considered — Bayes  pro- 
cedures, Gupta  type,  and  Seal  type,  all  are  translation  in- 
variant procedures.  The  loss  function  L ( y , a)  for  fixed  a is 
both  translation  and  scale  invariant  in  y . In  computing  the 
integrated  risks  the  following  sequence  of  reductions 
shows  that  the  integrated  risk  of  any  translation  invariant 


1 

" . t k,  - X tv  r 
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procedure  6 is  independent  of  m and  s. 
B (<$ , N (ml , r I + sU ) ) 


= // L(u,S  (x)  )d<l 


(r/ltr)x  + multiple  of  1, 


(r/l+r)I  + multiple  of  U 


d4Vnl,  (1  + r)  I + sU  (x) 


= // L(y,«(x))d*(r/1+r)Xf (r/1+r)I(p) 

d^ml , ( 1+r) I + sU(x)  by  Theorem  3‘2 

//L(nf6(x))d«(r/1+r)l/2x^I(u)d*n|lj  (1+r)I  + sU<x> 

//L  ( y , 6 (x)  )d<^(r/1+r)l/2x>I(l‘)d<:<0>  (1+r)I(x) 

by  Theorem  3.1 

= //L(n,5((r/l+r)"1/2z-))d<I'Zf  I()j)d<I>0  rI  (z)  . 

For  each  (r,  02/0^)  pair,  r and  indexing  the 

priors  and  the  loss  functions  respectively , the  best  Gupta 
type  procedure  and  the  best  Seal  type  procedure  are  found 
by  simulation  and  their  integrated  risks  are  compared  with 
the  Bayes  risk  by  simulation.  As  it  turns  out,  throughout 
the  range  of  r and  ^2^c\  studied,  for  each  (r,C2/Cj),  there 
is  always  a Gupta  type  procedure  that  perform  almost  as  well 
as  the  Bayes  procedure,  while  this  being  true  for  Seal  type 
procedures  only  when  r is  roughly  less  than  or  equal  to  1, 
i.e.  only  when  the  prior  is  very  informative  compared  to  the 
observations.  For  each  (r^'^/c^)  , Table  I gives  the  ap- 
proximate value  of  the  optimal  d , the  constant  associated 


with  the  best  Gupta  type  procedure.  Table  n qivas  tot- 
each  (r,c^/c^)  the  simulated  .integrated  risks  using  the  Bayes 
procedure  and  the  best  Gupta  type  procedure.  In  this  con- 
nection Chernoff  and  Yahav  (1977)  made  similar  studies  for 

the  loss  function  c,  (max  O.-max  9.)  + c~  (max  6 - 

_L  'I  s 2 j 

j=l,...,p  if  a j=l,...,p 

l 0^/|a|).  They  also  found  that  Bayes  procedures  can  be 
ica 

closely  approximated  by  Gupta  type  procedures.  Notice  that 
the  values  in  Table  II  depend  moderately  on  c^/c^  but  for 
fixed  c^/Cj  are  relatively  insensitive  to  r.  For  the  loss 
function  they  studied,  Chernoff  and  Yahav  (1977)  also  found 
this  phenomenon . 

The  question  that  has  to  be  answered  before  Gupta  type 
procedures  can  be  recommended  as  'the'  procedures  to  use  in 
all  normal  populations  situations  is  how  they  perform  under 
priors  other  than  the  normal,  i.e.  are  they  robust  against 
prioxns?  Simulation  studies  in  these  cases  became  more  diffi- 
cult and  have  not  yet  been  done.  But  until  they  are  done  we 
can  still  recommend  the  use  of  Gupta  type  procedures.  They 
have  at  least  some  optimality'  properties. 


Table  II.  Lists  for  Each  Prior  and  Loss  Function  Pair  the  Simulated  Integrated  Risk 
of  the  Bayes  Procedure  and  the  Best  Gupta  Type  Procedure  in  That  Order 
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CHAPTER  4 

SOME  ROBUST  AND  NONPARAMETRIC 
SUBSET  SELECTION  PROCEDURES 


The  search  for  good  procedures  generally  has  to  be 
carried  out  individually  for  each  distribution.  In  the  last 
chapter  we  found  that  in  the  case  of  normal  populations  with 
variances  known,  Gupta  type  procedures  have  some  near- 
optimality properties.  Now  consider  the  following  location 

model . Suppose  X.  , i-1,  . . . , p,a~l,  . . . ,n  are.  random  variables 

P n 

such  that  their  joint  distribution  is  II  IT  P(x.  -0  . ) whore 

i i i. 

i=l  a=] 

F is  only  partially  known  or  totally  unknown  and  G=(6^,..., 
0^)  is  an  unknown  vector  in  R^’.  Suppose  we  want  to  select 
in  terms  of  the  Ck  . There  is  a large  body  of  literature 
dealing  with  robust  and  nonparametric  estimation  of  the 
location  parameter.  All  the  good  known  estimators  are  asymp- 
totically normal  under  reasonable  regularity  conditions. 
Hence,  intuitively,  Gupta  type  procedures  based  on  these 
estimators  should  have  good  performance.  In  this  chapter 
we  propose  two  procedures  to  be  used  in  the  r-contaminated 
normal  populations  case.  These  two  procedures  asymptotically 
control  the  probability  of  a correct  selection.  We  also 
propose  a third  procedure  to  be  used  in  the  case  where  F is 
absolutely  continuous  but  otherwise  unknown.  This  procedure 


* 


I* 

- 

i 
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% 
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controls  i l.i  pr-  i ; ^ t • a cc.  n ct.  .,c loci i on  for  any 

sample  i . a.  • u-  i l.  in  I h . s-ntinlly  on  the-  Ilodges- 

I.ehmann  or.*  in  .to  it  i .hoi  if  l ! high  asymptotic  relative 

efficiency  of  that.  . ,,t  . . u*.  if  should  be  pointed  out  that 

the  problem  of  nonpar.:,  ; ric  subset  selection  is  an  old  one. 
Some  of  the  earlier  references  are  L'-hmann  (1963c),  Puri  and 
Puri  (1969,  L968)  , Mcl>>n<ld  (I  960)  and  Rixvi  and  Woodworth 

(1970) . The  approach  hero  di i fers  from  the  earlier  ones  in 
that  direct  estimators  of  the  parameter  are  employed  in  con- 
structing the  procedures. 


4 ._3  . i — Contaminated  Worm  1 Populut  ions 


With  Scale  Known 


Let  i = l,  . . . ,p,u-  1 , . . . ,n  be  random  variables 

having  the  joint  distribution 


P n 
n ii 
i=--l  a—l 


F ( x . 
la 


■v 


where  F = ( 1 - c ) <1  + ell,  * (0  £ l < 1)  is  a known  constant,  <!> 
is  the  standard  normal  cdf,  H is  an  unknown  symmetric  dis- 
tribution, and  0 = (0^,...,G  ) is  an  unknov/n  vector  belong- 
ing to  R*5. 

Lot  C be  the  class  of  all  r -contaminated  distributions, 

i . e . 

C = {F:(l-e)4>  + ell,  H symmetric  distribution  function}. 

We  shall  propose  two  asymptotically  equivalent  pro- 
cedures, one  based  on  Huber's  maximum  likelihood  estimator, 
the  other  on  the  trimmed  mean.  One  way  to  introduce  these 
estimators  is  as  follows: 
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The  distribution  F in  C having  the  smallest  Fisher 

information  has  the  density 

f (x)  = (2n) ~1/2 (l-e)e~po(x) 
o 


where 


1/2  x 


for  | x I < k 


po(>°  ^ k | x | - 1/2  k2  for  |x|  > k 


with  k depending  on  e through 

c/1  -r.  = 2$  (k)/k  - 24»  ( -k)  , 

v.’here  <J>  = <J>  ' is  the  standard  normal  density.  Let 

, x for  | x | < k 

4'(x)=p(x)={.  • , ii.,’ 

o o'  k sign  (x)  for  |x[  > k 

We  shall  denote  the  maximum  likelihood  estimator  of  0. 

i 

with  respect  to  the  above  F by  M.  . It  is  the  solution  of 
n ° 1 n 

y 4<  (x.  -o . ) =o. 

a-1 

The  (asymptotically)  best  estimator  of  0^  based  on 

linear  combinations  of  order  statistics  is  the  trimmed  mean 
n 

y h (8/ (n+1) ) x . , /n  where  x/D.  is  the  8th  order  statistic 
^ i l p ) i ( p ) 

from  x.  , 8=1,  . . . ,n, 

i p 


h (t)  ={ 


constant  for  Fq (-k) <t<FQ (k) 
0 otherwise 


such  that  £h ( 3/ (n+1) ) /n  = 1.  We  shall  denote  the  trimmed 
6=1 

mean  estimator  of  0 . by  L.  . 

l J in 

The  following  results  are  well  known.  See  Huber  (1964), 

Brickel  (1967),  and  Jackel  (1971). 

Theorem  4.1.1.  Under  the  sole  assumption  that  II  is 
1/2 

symmetric,  n ^in~®i^  s asymptotically  normal  with  mean  0. 


. • # w -4  - fm- 


■+  rk"]/ 
o 


2 i / ■ 

Lut  o ' bo  the  asymptotic  variance  o'  n / ' (I  -f  ) unJr  r r, 

n , j i n i ' 

2 2 2 2 ' . ’) 

then  sup  o - o = o whore  o = | ( i - ) , -t  rk  ]/ 

Ft  C (”'1'  r>'o  ° ° • u 

[ (l-e)E.O  ] '.  Under  tho  additional  . th  , u ha::  a 

v O 

!/■) 

continuous  derivative,  n "(I,  - is  ,«*,  • < ,<  , .1  p>  r;  l 

in  i ‘ ■ ■ j 

with  moan  0.  Let  a?  be  tho  asymptotic  v ; i<u.or  of  x}  ' ^ 

1j  t V 

-0^ ) under  F,  then  under  the  additional  assure  i ion  on  11, 

2 2 2 
sup  = o = o also. 

Ft C ' ' o ° 


example  Huber  (1964). 

Let  p*(l/p  < P*  < 1)  be  the  desired  minimum  probability 
of  a correct  selection.  Let  d be  the  pocitii . number  surh 


that 


/<-p  1 ( x+d ) dv ( x ) = P*  . 


We  now  describe  the  two  proposed  procedures. 
Procedure  R , (n)  based  on  the  Tr  i tamed  Me  a n 

] /? 

Select  .i  if  and  only  if  L.  > max  L -dr  /n  ' 

J m — . . in  o 

1=1 » • • • » I1  ' 

Procedure  R (n)  based  on  Huber's  Maxi  mum  1 i kej i hood 


Estimator 

] /2 

Select  i if  and  only  if  M.  > max  H.  -da  /n 

in  - j = l F Jn  ° 

Theorem  4.1.2.  Under  the  assumption  that  H is  sym- 
metric, sup  lim  inf  P(CS|R,(n)}  = P*.  Under  the  addition- 
FlC  n><-  OrRP  M 

al  assumption  that  11  has  a continuous  dcrivai  ive, 


sup  lim  inf  P{CS|R  (n)}  = P*. 
Ft  C n >rt>  0 RP  lj 


BWJwrw  ’W. . i- JJ 


r 


Proof,  l.o t F be  the  common  distribution  of  T,  -0  . , 
n j n x ' 

i l under  F.  Suppose  0.  - max  0.,  then 

j • 1 , . . . i p 

P , ICS  | R(n)  } / II  F (x+do  /n]/2-t.)dF  (x-0  . ) 

I / 1 1 M i n ° j n l 

v.  iich  is  nondecreasing  in  0^  and  nonincreasing  in  every 
other  component  of  0.  Ilence 

i,,rp  l’ft0(CS|RH<n)  ) - P^pcSlRpn)). 

where  0 (0,...,0).  Now 

lim  PF  Q f CS  | I<M ( n ) } = /^p_1  (x+do  /a  r.)d4>(x). 
n -'c''  ' ~ ’ 

Rut  sup  o 2 = o 2.  Hence 

FcC  1 ' ° 

inf  lim  inf  P„  ri  (CS  | R (n)  } = ^ (x+d)  dl  (x)  = P*. 

Fr.C  n—  0CRP  * ' 0 M 

The  proof  for  R (n)  is  exactly  similar. 

Ij 

4 .2  . F Absolutely  Continuous  Unknown  Case 

In  this  section  we  consider  the  case  where  F is 

absolutely  continuous  but  otherwise  unknown.  Let  X . , 'c-l, 

2,...,n,  i=l,2,...,p  be  independent  random  variables  such 

P n 

that  their  joint  distribution  is  IT  II  F(x.  -0.)  where  F 

. , , m l 

i-~l  a=l 

is  nsbolutely  continuous  but  otherwise  unknown,  and  0 - 

(0^,...,0  ) as  before  is  an  unknown  vector  in  Rp. 

Let  us  denote  the  rank  sum  of  { X. ,, X X.  } when 

J J ^ J * * 

they  are  compared  with  {x.^  • 2 ' ' ‘ ' ' Xin ^ R ^ . 


Cl 


f. i vice  r is  absolutely  continuous  we  can  assume  th<  so  differ- 

i -nee s to  be  distinct . Lot  W..  (A)  denote  the  number  of  pairs 

;j  1 

( i , ; ) I or  which 


In  accordance  with  tradition,  we  write  V«.  . (0)  simply  as  V?..  . . 

j a ■ j j 

The  following  theorem,  stated  as  Theorem  2.4  in  Lehman 
(1975),  gives  the  relationship  between  and  W..(A). 

Theorem  4.2.1.  (Lehmann,  1975).  Suppose  the  differ- 
ence Xj  - Xj  p are  distinct.  Then  for  any  integer  in  between 
2 

.1  and  n and  any  real  number  A, 


(1.2.1) 


D | ^ 1 ^ <_  A .if  and  only  if  W^(A)  <_  n‘ -m 


(4.2.2)  > A if  and  only  if  W ^(/.)  n2-m+l. 

Gupta's  procedure  R (n)  for  the  case  where1  F is  the: 
normal  distribution  with  unknown  variance  is  as  follows: 


Select  i if  and  only  if  X.  > X - d S/n  ' ' 

J l 3 n ' 

for  all  j,  j / i 

where  , i=l,...,p  are  the  sample  means,  S is  the  pooled 

estimate  of  the  standard  deviation  and  d is  just  large 

enough  such  that  inf  p (CS  | R..  (n)  } ^ P*.  p*  pre-detormined . 

OtRP  • L - 

Notice  that  this  procedure  is  equivalent  to  the  following: 
Select  i if  and  only  if  the  100P*%  simultaneous  con- 
fidence intervals  { 0 . — 1»  . < X.-X.  + d S/n1'2  for  all  j , j/i} 

i j 13  n J J 

cover  0 =:  ( 0 , . . . , 0 ) . 

Theorem  4.2.].  enables  us  to  construct,  nonpara  metric 

s iinu]  taneous  con  C i donee  intervals  for  (t* . --0  . , i/i)  a: 

1 J 

f<  1 I ows  : 


62 


(4.2.3) 

p {(,  -n  , • 

r,o  .1  ;i 

•j.;;’ 

= p.  I 0 <.  1 

* 0 (a 

-- 

" p0  {u"“a  < 
0 

w . . 

r - 2 

= P tn  -a  < 

u 

0 

r(3) 

1 

where 

0 is  such  that 
0 

(,1 = 

for  all  j,  j ' • J 
for  all  j , j/i } 

:oir  all  j , j/i  ) 

•1/2  n (n+1  ) for  all  j,  j/i) 


0 _ = ...  =6  . Honce  (4.2.3) 

2 p 

can  be  computed  exactly  by  enumerations. 

We  are  now  in  a position  to  propose  procedure  R (n)  , 

n 

the  nonparametric  analog  of  Gupta's  procedure  R (n)  based 

on  the  Hodges-Lehmann  estimator. 

Procedure  H (n)  Base  1 on  the  Hodges-Lehmann  Estimator 

Select  i if  and  only  if  0 < d|31?  for  all  j,  j/i 

an 

or,  equivalently, 

Select  i if  and  only  if  n ^ - a < R^-P-l/2  n(n+l) 

2 n l 

for  all  j,  j/i 

where  a^  is  the  smallest  integer  such  that 

{n^-n  < R^-P-l/2  n(n+l)  for  all  j,  j/i}  P*. 

o 

We  now  show  inf  P 0fCS|R„(n)}  ‘5  P*  for  any  sample 
Or  rP  ' 

size  n.  Suppose  without  loss  of  generality  0.  = 

1 J ~ 1 / . . . , j 


Then 


inf  Pp  iCs|RD(n)  } = inf  P_  ,,{0<n  31  for  all  j,  j/i) 
0cRp  R („dP  1 > 0 ( > 


OCR1 


(ji) 

n1 


~ inf  P , {0  -0.<d|31|  for  all  j,  j/i) 
0c F'°o  1 3 °n 

= PF  0 (0<nJ31j  for  all  j,  j/il 
' o ' n 


•••■—■--  


Pp  ln2-a  <R(?)~l/2  n(n+l)  for  all  j,  j/i) 
o 


The  asymptotic  value  of  a i s given  by  the  fol  lowing 


theorem . 


Theorem  4.2.2, 


lim  (n/2-a  ) / (n2  (2n+l ) /1 2)  ^2  - -d//2  where  d as  before  is 


determined  by 


fi'P  1 (x+d)  di  (x)  = P*  . 


Proof . 


P0  -a,i  < W..  for  all  j,  j/i) 


= P0  t(n2/2  - an)/(n2 (2n+l)/12) 
L o 


1/2  < (V7  ± - n 2 / 2 ) / 


(n2(2n+l)/12)1/2  for  all  j,  j/i). 

2 

It  is  well  known  that  the  random  vector  ((W. n'/2)/ 

2 l/° 

(n  (2r.+l)/12)  ",  j/i)  under  0q  is  asymptotically  distri- 

buted as  N (0, ]/2 (ItI ’ 1) ) . Hence  lim  (n2/2-a  )/(n2(2n  + l)/ 

1/2  ~ n>co  H 
12)  1/2  = -d//2  . 

The  following  generalization  of  Lemma  4 of  Lehmann 
(1963b)  enables  us  to  compute  the  asymptotic  relative 
efficiency  of  procedure  R^(n) . 

2 

Lemma  4.2.1.  Under  the  condition  jf  (x)dx  ->.  , for  fixed 

.i,  as  n-K»  the  random  vector  n^2  (D  \ - ( B . -0  . ) , j/i ) lias  a 

(a  ) 13 

1/2 

multi  variate  normal  distribution  with  mean  vector  (d/(12) 

2 

/f  ' (x) dx)  1 and  variance-cover i anoe  matrix 
o i t --  1/6  [ / f2  (x)  dx  ) 2 

o s . J / 1 2 | / 1 " ( x ) d < ) 2 


i ;■  *.  • • 


j i oof . Assume1  wi  thout  loss  of  qc-norality  that  n - -0 

- * i ) 


for  all  j,  j/i.  For  any  constant  vector 


( v , , i/i  ) 


v . <n  D 


, j/i)  - P{n  “a11<Wj  (n  ' v j)  » 


by  Theorem  4.2.1.  By  Lemma  1 of  Lehmann  (1963a),  the  random 
vector  n 3y2‘"  (W  ^ ^ (n^2v.j  ) -E  (n^^v  ^ ) ] , j/i)  is  asymptoti- 

cally normal  with  mean  0 and  variances  equal  to  .1/6  and 
covariances  equal  to  1/12.  Now  for  any  j,  j/i 

lim  n~  3 7 2 (E  |W  . . (n1/X 2v  . ) ] -n2/2} 

j-j->oo  J ^ 3 

z~-  lim  n-3</“  n 2 [ P { X . - X . > n1/2v.}-l/2) 

n ■*  oo  3 3 

- lim  n1/2  J[F(x-n1/2v.)-F(x)]dF(x) 

n >oc,  J 

- -v./f2(x)dx 

•3 

where  the  last  step  in  justified  because  J f/' (::)  d>:  ■ . See 
Olshen  (1967)  and  Mehra  and  Sarangi  (1967).  Hence  the  randon. 
vector  n J//“  ( (W  . ^ (n3//2v j ) -n2/2)  , j/i)  is  asyr.iptotica3.ly 
normal  with  mean  -/r  (x)dx  v and  variances  1/6  and  covari- 
ances 1/12.  By  Theorem  4.2.2.,  we  have 

lim  P { v . < n]/ 2 P ^ 1 1 , j/i} 

n -►»  ^ an 

- lin  P{n2-n2/2~(n2(2n+l)/12)1/,2d/t/2+o(n3/2)  < 

n ->  °- 

Wj  ■ (n1/2Vj)  , j/i) 

- lim  Pf-dt  f (12) 1/2/f2 (x)dx]v.+o(n3/2)/n3/2  < 

n >-«  9 

(12)  ]/7n-3/2  jw^  (n1//2v_. ) -n2/2 ] 

+ [ (12) 1/2/f2 (x)dx]v. , j/i  } 

By  taking  limit  we  get  the  desire  cl  result. 


4 •• 


(j  'j 


Asyr  ptot j c Relative  Efficiency  of  r n >• . G ..■/ >.  r.  ( n ) 
Consider  Gupta's  normal  means  p.oc  .ui;  ■■  F (j  ' dor crib  •<; 
car] Lcr: 

1/2 

Select.  3 af  and  only  if  X . -X . t c S/n"  ' " > 0 

i j n 

for  all  j,  j/i 

where  , i~3 , . . . ,p  are  the  sample  nouns,  S is  the  pooled 

estimate  of  the  common  standard  deviat  ion,  and  d is  deft  i - 

11 

mined  by 

CO  CO 

/ / <1^  1(x+sd  )d$(x)dG  (s) 

0 -»  n v 

where  is  the  cdf  of  x /v  with  v = p(n-l)  . 

2 2 
Denote  by  o the  variance  of  F.  We  shall  asset;'-  that  c <«*•. 

2 

Under  the  additional  assumption  / f'(x)dx<cc,  w.-  s-. from 

- Lemma  4 . 2 . 1 . that  for  any  i,  [ (12 ) 1 ^ / f ' (>-)d:<j  (bj^j  - 

{ ( 0 1 — 0 j ) , j/i)  and  1(X^-Xj  + d^/n^2,  j/'i)  have 

the  same  limiting  distribution.  Hence  if  n1  (n)  is  such  that 

lim  n'(n)/n  = l/12a  Iff 2 (x)  dx]  ",  then  lim  P„  , ( t CS  ! R (n  ' ) / 
n-w  n-+~  ’ J 

Pp  ^{CSlRjjtn)  } = 1 and  lim  Ep  0 [S  | RR(n' ) ]/Ep  0[S|RN(n)]  = 1 

n-*<»  ’ ’ 

for  any  F,6.  Therefore  if  we  define  asymptotic  relative 
efficiency  eR  N of  to  as  the  limit  of  the  reciprocal 
of  the  sample  sizes  required  such  that  the  two  procedures 
have  the  same  asymptotic  performance,  where-  p<  rioimance 
is  measured  in  terms  of  E(S|R)  for  controlled  P ! CS  | P. } , 

P{CS | R}  for  controlled  E(s|R),  or  a linear  combination  of 
P(CS  | R}  and  E (S  | R)  , then  e„  . = 12o^  f /f  2 (::)  dx]  z . It  is  well 

J \ f 1'1 

i known  that  ] ?u  2[(t 2 (x)  dx]  2 > 0 . 864  For  all  f,  and  1 2<  ''if'"  ( -)dx]^ 


"1/..  • 0.  fF.r»  when  f is  the  normal  density. 

* 
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& ABSTRACT  (Continue  on  reverae  aide  It  neceaaary  end  Identity  by  block  number ) 

Ranking  and  selection  procedures,  subset  selection  procedures  in  par- 
ticular, are  procedures  that  provide  in  a realistic  manner  attractive  ways 
of  handling  problems  that  are  commonly  treated  by  the  2-action  procedure  of  a 
global  I -test , anil  the  many-net  ion  procedure  of  a typical  multiple  range  test. 

Consider  the  usual  one  way  layout  situation  in  analysis  of  variance. *■* 

Usually  the  experimenter  wants  to  know  more  than  just  whether  all  the  treat 
men t eltects  are  equal,  but  lie  may  not  want  to  make  inferences  concerning  all 
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linear  contrasts  of  means. VjDne  of  the  more  frequently  occuring  situations  for 
which  this  is  so  is  where  the' .experimenter  simply  wishes  to  know  which  of  the 
treatments  gives  the  best  product .3  In  this  situation,  formulating  the  problem 
as  a selection  problem  is  appropriate.  Subset  selection  procedures  are  often 
thought  of  as  screening  procedures.  If  the  data  indicates  several  treatments 
are  better  than  the  remaining  treatments  but  no  treatment  is  clearly  the  best, 
[then  perhaps  the  experimenter  ought  to  retain  all  of  the  better  treatments  for 
future  considerations. 

It  is  generally  recognized  that  for  multivariate  problems  uniformly  best 
procedures  do  not  exist.  Hence  it  is  reasonable  to  look  for  procedures  that 
do  well  on  the  average,  averaged  over  the  parameter  space  by  some  prior.  This 
approach  has  been  taken  in  the  first  two  chapters.  The  essentially  complete 
class  of  Bayes  procedures  and  their  limits  is  investigated.  The  concept  of 
Total  Monotone  Likelihood  Ratio  is  introduced  as  the  multivariate  analog  of 
univariate  monotone  likelihood  ratio.  Then  a multivariate  analog  of  the 
classical  result  of  Karlin  and  Rubin  (1956),  that  monotone  procedures  form  an 
essentially  complete  class,  is  proved  for  a loss  function  which  seems  natural 
to  the  subset  selection  problem  by  proving  that  Bayes  procedures  are  monotone. 

Bayes  procedures  typically  require  numerical  integrations  to  implement 
and  this  makes  them  sometimes  unsuitable  for  practical  use.  Besides,  the 
use  of  Bayes  procedures  is  by  no  means  universally  accepted.  So  if  there  is 
available  an  easy  to  implement  procedure  whose  performance  is  close  to  that  of 
the  Bayes  procedure,  then  this  procedure  ought  to  be  used.  This  possibility 
is  explored  in  Chapter  3 for  the  case  of  normal  populations  problem  and  normal 
exchangeable  priors.  As  it  turns  out,  for  each  prior  and  loss  function  pair 
there  is  always  a Gupta  type  procedure  that  performs  almost  as  well  as  the 
Bayes  procedure,  while  this  being  true  for  Seal  type  procedures  only  when  the 
normal  prior  is  very  informative.  As  of  yet  we  do  not  know  how  these  pro- 
cedures perform  when  the  prior  is  not  normal.  Nevertheless  we  recommend  the 
use  of  Gupta  type  procedures  when  the  observations  arise  from  normal  distri- 
butions as  procedures  that  have  at  least  some  near  optimality  properties. 

In  the  case  where  the  parameter  of  interest  is  a location  parameter  and 
the  underlying  distribution  is  not  entirely  known  there  are  good  robust  esti- 
mators of  the  parameter.  Under  mild  regularity  conditions  they  are  asymp- 
totically normal.  From  the  results  of  Chapter  3,  we  would  expect  that 
Gupta  type  procedures  based  on  these  estimators  to  have  good  asymptotic 
performance.  In  Chapter  4 robust  and  nonparametric  Gupta  type  procedures 
are  proposed.  One  procedure  in  particular,  the  procedure  based  on  simul- 
taneous confidence  bounds  derived  from  rank  tests,  is  nonparmatric . It 
controls  the  probability  of  a correct  selection  for  any  sample  size. 

Since  it  is  based  essentially  on  the  llodges-Lehmann  estimator,  it  inherit- 
the  high  asymptotic  relative  efficiency  of  that  estimator. 
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